Publications
2024
Unifying Multimodal Retrieval via Document Screenshot Embedding
EMNLP 2024 Conference
We directly encode document screenshots into vectors using visual-LLM for semantic search, unifying multimodal retrieval approaches.
📄 Paper EMNLP 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
NeurIPS 2024 Conference
NEST enhances factuality and attribution of LLMs through nearest neighbor speculative decoding techniques.
Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?
SIGIR 2024 Conference
We investigate how query expansion techniques can improve the generalization capabilities of strong cross-encoder ranking models.
📄 Paper SIGIR 2024
CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders
NAACL 2024 Conference
CELI provides a simple yet effective approach to improve cross-encoder generalization across different domains.
📄 Paper NAACL 2024
2023
How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval
EMNLP 2023 Conference
DRAGON introduces diverse augmentation techniques to improve the generalization of dense retrieval systems across different domains.
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval
ACL 2023 Conference
CITADEL provides an efficient multi-vector retriever that is about 40x faster than ColBERT-v2 on GPUs through dynamic lexical routing.
SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
SIGIR 2023 Conference
SLIM reduces the latency and storage of ColBERT while being fully compatible with Pyserini (Lucene-based) indexing systems.
2022
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking
EMNLP 2022 Conference
We provide certified error control for candidate set pruning in two-stage relevance ranking systems.
An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question Answering
TrustNLP 2022 Workshop
We analyze encoder attributions in Dense Passage Retriever for open-domain question answering tasks.
📄 Paper TrustNLP 2022
2021
Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval
EMNLP 2021 Conference
We propose simple and effective unsupervised methods to compress dense vectors for passage retrieval.
Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering
EMNLP 2021 Conference
We develop multi-task dense retrieval methods using model uncertainty fusion for open-domain question answering.