Publications
2024
Unifying Multimodal Retrieval via Document Screenshot Embedding
Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 (upcoming).
[Paper]Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin
The 38th Annual Conference on Neural Information Processing Systems (NeurIPS), 2024 (upcoming).
[Paper]Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?
Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky
The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024.
[Paper]CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders
Xinyu Zhang*, Minghan Li*, Jimmy Lin
North American Chapter of the Association for Computational Linguistics (NAACL), 2024.
[Paper]
2023
How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval
Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
[Paper] / [Code]CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval
Minghan Li*, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen*
Association for Computational Linguistics (ACL), 2023.
[Paper] / [Code]SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023.
[Paper] / [Code]Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval
Sheng-Chieh Lin, Minghan Li, Jimmy Lin
Transactions of the Association for Computational Linguistics (TACL), 2023.
[Paper] / [Code]
2022
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking
Minghan Li*, Xinyu Zhang*, Ji Xin, Hongyang Zhang, Jimmy Lin
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
[Paper] / [Code]An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question Answering
Minghan Li, Xueguang Ma, Jimmy Lin
The 2nd Workshop on Trustworthy Natural Language Processing (NAACL), 2022.
[Paper]
2021
Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval
Xueguang Ma*, Minghan Li*, Kai Sun, Ji Xin, Jimmy Lin
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
[Paper] / [Code]Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering
Minghan Li*, Ming Li, Kun Xiong, Jimmy Lin
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
[Paper] / [Code]Another Look at DPR: Reproduction of Training and Replication of Retrieval
Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li, Jimmy Lin
Conference on European Conference on Information Retrieval (ECIR), 2022.
[Paper] / [Code]