Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Lower-bounding term frequency normalization
Proceedings of the 20th ACM international conference on Information and knowledge management
Adaptive term frequency normalization for BM25
Proceedings of the 20th ACM international conference on Information and knowledge management
Predicting Query Performance by Query-Drift Estimation
ACM Transactions on Information Systems (TOIS)
A log-logistic model-based interpretation of TF normalization of BM25
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Effects of language and topic size in patent IR: an empirical study
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Composition of TF normalizations: new insights on scoring functions for ad hoc IR
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On exploiting content and citations together to compute similarity of scientific papers
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On combining text-based and link-based similarity measures for scientific papers
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Hi-index | 0.03 |
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namely BM25L, which "shifts" the term frequency normalization formula to boost scores of very long documents. Our experiments show that BM25L, with the same computation cost, is more effective and robust than the standard BM25.