Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p2
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
Graph mining: Laws, generators, and algorithms
ACM Computing Surveys (CSUR)
ICML '06 Proceedings of the 23rd international conference on Machine learning
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the IR research, 30th European conference on Advances in information retrieval
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
The BNB distribution for text modeling
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
A constraint to automatically regulate document-length normalisation
Proceedings of the 21st ACM international conference on Information and knowledge management
Estimation of the collection parameter of information models for IR
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
A novel TF-IDF weighting scheme for effective ranking
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
We first present in this paper an analytical view of heuristic retrieval constraints which yields simple tests to determine whether a retrieval function satisfies the constraints or not. We then review empirical findings on word frequency distributions and the central role played by burstiness in this context. This leads us to propose a formal definition of burstiness which can be used to characterize probability distributions with respect to this phenomenon. We then introduce the family of information-based IR models which naturally captures heuristic retrieval constraints when the underlying probability distribution is bursty and propose a new IR model within this family, based on the log-logistic distribution. The experiments we conduct on several collections illustrate the good behavior of the log-logistic IR model: It significantly outperforms the Jelinek-Mercer and Dirichlet prior language models on most collections we have used, with both short and long queries and for both the MAP and the precision at 10 documents. It also compares favorably to BM25 and has similar performance to classical DFR models such as InL2 and PL2.