Publications

2024

Unifying Multimodal Retrieval via Document Screenshot Embedding

EMNLP 2024 Conference
Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin
We directly encode document screenshots into vectors using visual-LLM for semantic search, unifying multimodal retrieval approaches.

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

NeurIPS 2024 Conference
Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin
NEST enhances factuality and attribution of LLMs through nearest neighbor speculative decoding techniques.

Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?

SIGIR 2024 Conference
Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky
We investigate how query expansion techniques can improve the generalization capabilities of strong cross-encoder ranking models.

CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders

NAACL 2024 Conference
Xinyu Zhang*, Minghan Li*, Jimmy Lin
CELI provides a simple yet effective approach to improve cross-encoder generalization across different domains.

2023

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

EMNLP 2023 Conference
Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen
DRAGON introduces diverse augmentation techniques to improve the generalization of dense retrieval systems across different domains.

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

ACL 2023 Conference
Minghan Li*, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen*
CITADEL provides an efficient multi-vector retriever that is about 40x faster than ColBERT-v2 on GPUs through dynamic lexical routing.

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

SIGIR 2023 Conference
Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin
SLIM reduces the latency and storage of ColBERT while being fully compatible with Pyserini (Lucene-based) indexing systems.

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

TACL 2023 Journal
Sheng-Chieh Lin, Minghan Li, Jimmy Lin
Aggretriever presents a simple approach to aggregate textual representations for building more robust dense passage retrieval systems.

2022

Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

EMNLP 2022 Conference
Minghan Li*, Xinyu Zhang*, Ji Xin, Hongyang Zhang, Jimmy Lin
We provide certified error control for candidate set pruning in two-stage relevance ranking systems.

An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question Answering

TrustNLP 2022 Workshop
Minghan Li, Xueguang Ma, Jimmy Lin
We analyze encoder attributions in Dense Passage Retriever for open-domain question answering tasks.

2021

Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval

EMNLP 2021 Conference
Xueguang Ma*, Minghan Li*, Kai Sun, Ji Xin, Jimmy Lin
We propose simple and effective unsupervised methods to compress dense vectors for passage retrieval.

Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering

EMNLP 2021 Conference
Minghan Li*, Ming Li, Kun Xiong, Jimmy Lin
We develop multi-task dense retrieval methods using model uncertainty fusion for open-domain question answering.

Another Look at DPR: Reproduction of Training and Replication of Retrieval

ECIR 2022 Conference
Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li, Jimmy Lin
We reproduce DPR training and replicate retrieval results to provide insights into dense passage retrieval.