Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval
Abstract
Stratified sampling improves knowledge distillation by preserving the full range of teacher scores, outperforming traditional sampling methods in retrieval tasks.
Transferring knowledge from a cross-encoder teacher via Knowledge Distillation (KD) has become a standard paradigm for training retrieval models. While existing studies have largely focused on mining hard negatives to improve discrimination, the systematic composition of training data and the resulting teacher score distribution have received relatively less attention. In this work, we highlight that focusing solely on hard negatives prevents the student from learning the comprehensive preference structure of the teacher, potentially hampering generalization. To effectively emulate the teacher score distribution, we propose a Stratified Sampling strategy that uniformly covers the entire score spectrum. Experiments on in-domain and out-of-domain benchmarks confirm that Stratified Sampling, which preserves the variance and entropy of teacher scores, serves as a robust baseline, significantly outperforming top-K and random sampling in diverse settings. These findings suggest that the essence of distillation lies in preserving the diverse range of relative scores perceived by the teacher.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Reproducing and Comparing Distillation Techniques for Cross-Encoders (2026)
- Training Dense Retrievers with Multiple Positive Passages (2026)
- ECI: Effective Contrastive Information to Evaluate Hard-Negatives (2026)
- CLIP-RD: Relational Distillation for Efficient CLIP Knowledge Distillation (2026)
- Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion (2026)
- Learning Retrieval Models with Sparse Autoencoders (2026)
- TabKD: Tabular Knowledge Distillation through Interaction Diversity of Learned Feature Bins (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.04734 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper