Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Optimisation methods for ranking functions with multiple parameters
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On setting the hyper-parameters of term frequency normalization for information retrieval
ACM Transactions on Information Systems (TOIS)
An exploration of proximity measures in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Bridging Language Modeling and Divergence from Randomness Models: A Log-Logistic Model for IR
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A machine learning approach for improved BM25 retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
A statistical approach to mechanized encoding and searching of literary information
IBM Journal of Research and Development
Learning concept importance using a weighted dependence model
Proceedings of the third ACM international conference on Web search and data mining
How good is a span of terms?: exploiting proximity to improve web retrieval
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Information-based models for ad hoc IR
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
When documents are very long, BM25 fails!
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Lower-bounding term frequency normalization
Proceedings of the 20th ACM international conference on Information and knowledge management
Adaptive term frequency normalization for BM25
Proceedings of the 20th ACM international conference on Information and knowledge management
Extending BM25 with multiple query operators
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Estimation of the collection parameter of information models for IR
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
The effectiveness of BM25 retrieval function is mainly due to its sub-linear term frequency (TF) normalization component, which is controlled by a parameter k1. Although BM25 was derived based on the classic probabilistic retrieval model, it has been so far unclear how to interpret its parameter k1 probabilistically, making it hard to optimize the setting of this parameter. In this paper, we provide a novel probabilistic interpretation of the BM25 TF normalization and its parameter k1 based on a log-logistic model for the probability of seeing a document in the collection with a given level of TF. The proposed interpretation allows us to derive different approaches to estimation of parameter k1 based solely on the current collection without requiring any training data, thus effectively eliminating one free parameter from BM25. Our experiment results show that the proposed approaches can accurately predict the optimal k1 without requiring training data and achieve better or comparable retrieval performance to a well-tuned BM25 where k1 is optimized based on training data.