Probabilistic models in information retrieval
The Computer Journal - Special issue on information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of axiomatic approaches to information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic term matching in axiomatic approaches to information retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Artificial Intelligence Review
A statistical approach to mechanized encoding and searching of literary information
IBM Journal of Research and Development
Information-based models for ad hoc IR
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Diagnostic Evaluation of Information Retrieval Models
ACM Transactions on Information Systems (TOIS)
When documents are very long, BM25 fails!
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Adaptive term frequency normalization for BM25
Proceedings of the 20th ACM international conference on Information and knowledge management
Adaptive term frequency normalization for BM25
Proceedings of the 20th ACM international conference on Information and knowledge management
A log-logistic model-based interpretation of TF normalization of BM25
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
An exploration of ranking heuristics in mobile local search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Query likelihood with negative query generation
Proceedings of the 21st ACM international conference on Information and knowledge management
A constraint to automatically regulate document-length normalisation
Proceedings of the 21st ACM international conference on Information and knowledge management
Composition of TF normalizations: new insights on scoring functions for ad hoc IR
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Tie Breaker: A Novel Way of Combining Retrieval Signals
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Towards efficient indexing of arbitrary similarity: vision paper
ACM SIGMOD Record
Graph-of-word and TW-IDF: new approach to ad hoc IR
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Document Score Distribution Models for Query Performance Inference and Prediction
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly; as a result, very long documents tend to be overly penalized. In order to analytically diagnose this problem, we propose two desirable formal constraints to capture the heuristic of lower-bounding TF, and use constraint analysis to examine several representative retrieval functions. Analysis results show that all these retrieval functions can only satisfy the constraints for a certain range of parameter values and/or for a particular set of query terms. Empirical results further show that the retrieval performance tends to be poor when the parameter is out of the range or the query term is not in the particular set. To solve this common problem, we propose a general and efficient method to introduce a sufficiently large lower bound for TF normalization which can be shown analytically to fix or alleviate the problem. Our experimental results demonstrate that the proposed method, incurring almost no additional computational cost, can be applied to state-of-the-art retrieval functions, such as Okapi BM25, language models, and the divergence from randomness approach, to significantly improve the average precision, especially for verbose queries.