Adaptive term frequency normalization for BM25

Authors:
Yuanhua Lv;ChengXiang Zhai
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 9
Cited 3

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
On setting the hyper-parameters of term frequency normalization for information retrieval

ACM Transactions on Information Systems (TOIS)
When documents are very long, BM25 fails!

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Lower-bounding term frequency normalization

Proceedings of the 20th ACM international conference on Information and knowledge management

Lower-bounding term frequency normalization

Proceedings of the 20th ACM international conference on Information and knowledge management
A log-logistic model-based interpretation of TF normalization of BM25

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Composition of TF normalizations: new insights on scoring functions for ad hoc IR

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

A key component of BM25 contributing to its success is its sub linear term frequency (TF) normalization formula. The scale and shape of this TF normalization component is controlled by a parameter k1, which is generally set to a term-independent constant. We hypothesize and show empirically that in order to optimize retrieval performance, this parameter should be set in a term-specific way. Following this intuition, we propose an information gain measure to directly estimate the contributions of repeated term occurrences, which is then exploited to fit the BM25 function to predict a term-specific k1. Our experiment results show that the proposed approach, without needing any training data, can efficiently and automatically estimate a term-specific k1, and is more effective and robust than the standard BM25.