Compression-based document length prior for language models

Authors:
Javier Parapar;David E. Losada;Álvaro Barreiro
Affiliations:
University of A Coruña, A Coruña, Spain;University of Santiago de Compostela, Santiago de Compostela, Spain;University of A Coruña, A Coruña, Spain
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 5
Cited 0

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
An analysis on document length retrieval trends in language modeling smoothing

Information Retrieval
On compression-based text classification

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Clustering by compression

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The inclusion of document length factors has been a major topic in the development of retrieval models. We believe that current models can be further improved by more refined estimations of the document's scope. In this poster we present a new document length prior that uses the size of the compressed document. This new prior is introduced in the context of Language Modeling with Dirichlet smoothing. The evaluation performed on several collections shows significant improvements in effectiveness.