A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Bridging Language Modeling and Divergence from Randomness Models: A Log-Logistic Model for IR
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Retrieval constraints and word frequency distributions: a log-logistic model for IR
Proceedings of the 18th ACM conference on Information and knowledge management
Information-based models for ad hoc IR
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Retrieval constraints and word frequency distributions a log-logistic model for IR
Information Retrieval
Hi-index | 0.00 |
We first review in this paper the burstiness and aftereffect of future sampling phenomena, and propose a formal, operational criterion to characterize distributions according to these phenomena. We then introduce the Beta negative binomial distribution for text modeling, and show its relations to several models (in particular to the Laplace law of succession and to the tf-itf model used in the Divergence from Randomness framework of [2]). We finally illustrate the behavior of this distribution on text categorization and information retrieval experiments.