Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Neural networks for pattern recognition
Neural networks for pattern recognition
Information Processing and Management: an International Journal - Special issue: history of information science
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document normalization revisited
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Probabilistic models of indexing and searching
SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Applying Machine Learning to Text Segmentation for Information Retrieval
Information Retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Document Length Normalization by Statistical Regression
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
An analysis on document length retrieval trends in language modeling smoothing
Information Retrieval
Revisiting the relationship between document length and relevance
Proceedings of the 17th ACM conference on Information and knowledge management
Probabilistic document length priors for language models
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Hi-index | 0.00 |
Classical probabilistic information retrieval (IR) models, e.g. BM25, deal with document length based on a trade-off between the Verbosity hypothesis, which assumes the independence of a document's relevance of its length, and the Scope hypothesis, which assumes the opposite. Despite the effectiveness of the classical probabilistic models, the potential relationship between document length and relevance is not fully explored to improve retrieval performance. In this paper, we conduct an in-depth study of this relationship based on the Scope hypothesis that document length does have its impact on relevance. We study a list of probability density functions and examine which of the density functions fits the best to the actual distribution of the document length. Based on the studied probability density functions, we propose a length-based BM25 relevance weighting model, called BM25L, which incorporates document length as a substantial weighting factor. Extensive experiments conducted on standard TREC collections show that our proposed BM25L markedly outperforms the original BM25 model, even if the latter is optimized.