Towards a better understanding of language model information retrieval

Authors:
M. Van Der Heijden;I. G. Sprinkhuizen-Kuyper;Th. P. Van Der Weide
Affiliations:
Radboud University Nijmegen;Radboud University Nijmegen, Donders Institute for Brain Cognition and Behavior;Radboud University Nijmegen, Institute for Computing and Information Science
Venue:
FDIA'08 Proceedings of the 2nd BCS IRSG conference on Future Directions in Information Access
Year:
2008

Citing 11
Cited 0

A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating word relationships into language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Language model information retrieval with document expansion

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An analysis on document length retrieval trends in language modeling smoothing

Information Retrieval
A risk minimization framework for information retrieval

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language models form a class of successful probabilistic models in information retrieval. However, knowledge of why some methods perform better than others in a particular situation remains limited. In this study we analyze what language model factors influence information retrieval performance. Starting from popular smoothing methods we review what data features have been used. Document length and a measure of document word distribution turned out to be the important factors, in addition to a distinction in estimating the probability of seen and unseen words. We propose a class of parameter-free smoothing methods, of which multiple specific instances are possible. Instead of parameter tuning however, an analysis of data features should be used to decide upon a specific method. Finally, we discuss some initial experiments.