Term frequency normalisation tuning for BM25 and DFR models

Authors:
Ben He;Iadh Ounis
Affiliations:
Department of Computing Science, University of Glasgow, United Kingdom;Department of Computing Science, University of Glasgow, United Kingdom
Venue:
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Year:
2005

Citing 8
Cited 10

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of a very large web search engine query log

ACM SIGIR Forum
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Information Retrieval

Information Retrieval
Document normalization revisited

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
A study of parameter tuning for term frequency normalization

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management

A study of the dirichlet priors for term frequency normalisation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Effective level of term frequency impact on large-scale retrieval performance: by top-term ranking method

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Query performance prediction

Information Systems
Parameter sensitivity in the probabilistic model for ad-hoc retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

Artificial Intelligence Review
Setting per-field normalisation hyper-parameters for the named-page finding search task

ECIR'07 Proceedings of the 29th European conference on IR research
Reverted indexing for feedback and expansion

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Document length normalization using effective level of term frequency in large collections

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
A constraint to automatically regulate document-length normalisation

Proceedings of the 21st ACM international conference on Information and knowledge management
A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The term frequency normalisation parameter tuning is a crucial issue in information retrieval (IR), which has an important impact on the retrieval performance. The classical pivoted normalisation approach suffers from the collection-dependence problem. As a consequence, it requires relevance assessment for each given collection to obtain the optimal parameter setting. In this paper, we tackle the collection-dependence problem by proposing a new tuning method by measuring the normalisation effect. The proposed method refines and extends our methodology described in [7]. In our experiments, we evaluate our proposed tuning method on various TREC collections, for both the normalisation 2 of the Divergence From Randomness (DFR) models and the BM25's normalisation method. Results show that for both normalisation methods, our tuning method significantly outperforms the robust empirically-obtained baselines over diverse TREC collections, while having a marginal computational cost.