Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
A language modeling approach to information retrieval
A vector space model for automatic indexing
Communications of the ACM
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Advances in Informational Retrieval: Recent Research from the Center for Intelligent Information Retrieval
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage language models for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Bayesian extension to the language model for ad hoc information retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Language Modeling for Information Retrieval
Language Modeling for Information Retrieval
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Formal multiple-bernoulli models for language modeling
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A generative theory of relevance
A generative theory of relevance
Proceedings of the 14th ACM international conference on Information and knowledge management
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Aspects of sentence retrieval
An analysis on document length retrieval trends in language modeling smoothing
Information Retrieval
A retrieval evaluation methodology for incomplete relevance assessments
ECIR'07 Proceedings of the 29th European conference on IR research
Age dependent document priors in link structure analysis
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Frequentist and bayesian approach to information retrieval
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
An efficient computation of the multiple-bernoulli language model
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Probabilistic self-organizing maps for qualitative data
Neural Networks
A novel neighborhood based document smoothing model for information retrieval
Information Retrieval
Hi-index | 0.01 |
Although the seminal proposal to introduce language modeling in information retrieval was based on a multivariate Bernoulli model, the predominant modeling approach is now centered on multinomial models. Language modeling for retrieval based on multivariate Bernoulli distributions is seen inefficient and believed less effective than the multinomial model. In this article, we examine the multivariate Bernoulli model with respect to its successor and examine its role in future retrieval systems. In the context of Bayesian learning, these two modeling approaches are described, contrasted, and compared both theoretically and computationally. We show that the query likelihood following a multivariate Bernoulli distribution introduces interesting retrieval features which may be useful for specific retrieval tasks such as sentence retrieval. Then, we address the efficiency aspect and show that algorithms can be designed to perform retrieval efficiently for multivariate Bernoulli models, before performing an empirical comparison to study the behaviorial aspects of the models. A series of comparisons is then conducted on a number of test collections and retrieval tasks to determine the empirical and practical differences between the different models. Our results indicate that for sentence retrieval the multivariate Bernoulli model can significantly outperform the multinomial model. However, for the other tasks the multinomial model provides consistently better performance (and in most cases significantly so). An analysis of the various retrieval characteristics reveals that the multivariate Bernoulli model tends to promote long documents whose nonquery terms are informative. While this is detrimental to the task of document retrieval (documents tend to contain considerable nonquery content), it is valuable for other tasks such as sentence retrieval, where the retrieved elements are very short and focused.