A comparative study of probabilistic and language models for information retrieval

Authors:
Graham Bennett;Falk Scholer;Alexandra Uitdenbogerd
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Year:
2008

Citing 14
Cited 4

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
A taxonomy of web search

ACM SIGIR Forum
Information Retrieval: A Health and Biomedical Perspective

Information Retrieval: A Health and Biomedical Perspective
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

Mining Historic Query Trails to Label Long and Rare Search Engine Queries

ACM Transactions on the Web (TWEB)
Quantifying the impact of concept recognition on biomedical information retrieval

Information Processing and Management: an International Journal
Fully utilize feedbacks: language model based relevance feedback in information retrieval

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Text mining in negative relevance feedback

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different query sets and heterogeneous collections, which make reported results difficult to compare. This research is a broad-based study that evaluates language models against a variety of search tasks --- topic finding, named-page finding and topic distillation. The standard Text REtrieval Conference (TREC) methodology is used to compare language models to the probabilistic Okapi BM25 system. Using consistent parameter choices, we compare results of different language models on three different search tasks, multiple query sets and three different text collections. For ad hoc retrieval, the Dirichlet smoothing method was found to be significantly better than Okapi BM25, but for named-page finding Okapi BM25 was more effective than the language modelling methods. Optimal smoothing parameters for each method were found to be dependent on the collection and the query set. For longer queries, the language modelling approaches required more aggressive smoothing but they were found to be more effective than with shorter queries. The choice of smoothing method was also found to have a significant effect on the performance of language models for information retrieval.