Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
A language modeling approach to information retrieval
Experimentation as a way of life: Okapi at TREC
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Modern Information Retrieval
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage language models for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Bayesian extension to the language model for ad hoc information retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Dependence language model for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Regularized estimation of mixture models for robust pseudo-relevance feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Active relevance feedback for difficult queries
Proceedings of the 17th ACM conference on Information and knowledge management
Deriving TF-IDF as a fisher kernel
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
The classical probabilistic models attempt to capture the ad hoc information retrieval problem within a rigorous probabilistic framework. It has long been recognized that the primary obstacle to the effective performance of the probabilistic models is the need to estimate a relevance model. The Dirichlet compound multinomial (DCM) distribution based on the Polya Urn scheme, which can also be considered as a hierarchical Bayesian model, is a more appropriate generative model than the traditional multinomial distribution for text documents. We explore a new probabilistic model based on the DCM distribution, which enables efficient retrieval and accurate ranking. Because the DCM distribution captures the dependency of repetitive word occurrences, the new probabilistic model based on this distribution is able to model the concavity of the score function more effectively. To avoid the empirical tuning of retrieval parameters, we design several parameter estimation algorithms to automatically set model parameters. Additionally, we propose a pseudo-relevance feedback algorithm based on the mixture modeling of the Dirichlet compound multinomial distribution to further improve retrieval accuracy. Finally, our experiments show that both the baseline probabilistic retrieval algorithm based on the DCM distribution and the corresponding pseudo-relevance feedback algorithm outperform the existing language modeling systems on several TREC retrieval tasks. The main objective of this research is to develop an effective probabilistic model based on the DCM distribution. A secondary objective is to provide a thorough understanding of the probabilistic retrieval model by a theoretical understanding of various text distribution assumptions.