Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage language models for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document normalization revisited
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance weighting for query independent evidence
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 14th ACM international conference on Information and knowledge management
Age dependent document priors in link structure analysis
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Assessing multivariate Bernoulli models for information retrieval
ACM Transactions on Information Systems (TOIS)
Revisiting the relationship between document length and relevance
Proceedings of the 17th ACM conference on Information and knowledge management
Ranked feature fusion models for ad hoc retrieval
Proceedings of the 17th ACM conference on Information and knowledge management
Terminological cleansing for improved information retrieval based on ontological terms
Proceedings of the WSDM '09 Workshop on Exploiting Semantic Annotations in Information Retrieval
A relevance model for a data warehouse contextualized with documents
Information Processing and Management: an International Journal
Positional language models for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Compression-based document length prior for language models
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A query model based on normalized log-likelihood
Proceedings of the 18th ACM conference on Information and knowledge management
Ontology refinement for improved information retrieval
Information Processing and Management: an International Journal
Unsupervised estimation of dirichlet smoothing parameters
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Combining term-based and category-based representations for entity search
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Enhancing ad-hoc relevance weighting using probability density estimation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Query modeling for entity search based on terms, categories, and examples
ACM Transactions on Information Systems (TOIS)
Category-based query modeling for entity search
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Towards a better understanding of language model information retrieval
FDIA'08 Proceedings of the 2nd BCS IRSG conference on Future Directions in Information Access
Credibility-inspired ranking for blog post retrieval
Information Retrieval
Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval
Information Processing and Management: an International Journal
Bridging memory-based collaborative filtering and text retrieval
Information Retrieval
Hi-index | 0.00 |
Document length is widely recognized as an important factor for adjusting retrieval systems. Many models tend to favor the retrieval of either short or long documents and, thus, a length-based correction needs to be applied for avoiding any length bias. In Language Modeling for Information Retrieval, smoothing methods are applied to move probability mass from document terms to unseen words, which is often dependant upon document length. In this article, we perform an in-depth study of this behavior, characterized by the document length retrieval trends, of three popular smoothing methods across a number of factors, and its impact on the length of documents retrieved and retrieval performance. First, we theoretically analyze the Jelinek---Mercer, Dirichlet prior and two-stage smoothing strategies and, then, conduct an empirical analysis. In our analysis we show how Dirichlet prior smoothing caters for document length more appropriately than Jelinek---Mercer smoothing which leads to its superior retrieval performance. In a follow up analysis, we posit that length-based priors can be used to offset any bias in the length retrieval trends stemming from the retrieval formula derived by the smoothing technique. We show that the performance of Jelinek---Mercer smoothing can be significantly improved by using such a prior, which provides a natural and simple alternative to decouple the query and document modeling roles of smoothing. With the analysis of retrieval behavior conducted in this article, it is possible to understand why the Dirichlet Prior smoothing performs better than the Jelinek---Mercer, and why the performance of the Jelinek---Mercer method is improved by including a length-based prior.