Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring

Authors:
Vo Ngoc Anh;Raymond Wan;Alistair Moffat
Affiliations:
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia 3010;Bioinformatics Center, Kyoto University, Kyoto, Japan 611-0011;Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia 3010
Venue:
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Year:
2008

Citing 12
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automatic routing and retrieval using Smart: TREC-2

TREC-2 Proceedings of the second conference on Text retrieval conference
Document length normalization

Information Processing and Management: an International Journal - Special issue: history of information science
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the similarity space

ACM SIGIR Forum
A probabilistic model of information retrieval: development and comparative experiments Part 2

Information Processing and Management: an International Journal
Models in information retrieval

Lectures on information retrieval
Simplified similarity scoring using term ranks

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The influence of caption features on clickthrough patterns in web search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Latent concept expansion using markov random fields

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical view of binned retrieval models

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Index ordering by query-independent measures

Information Processing and Management: an International Journal
LePrEF: Learn to precompute evidence fusion for efficient query evaluation

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The BM25 similarity computation has been shown to provide effective document retrieval. In operational terms, the formulae which form the basis for BM25 employ both term frequency and document length normalization. This paper considers an alternative form of normalization using document-centric impacts, and shows that the new normalization simplifies BM25 and reduces the number of tuning parameters. Motivation is provided by a preliminary analysis of a document collection that shows that impacts are more likely to identify documents whose lengths resemble those of the relevant judgments.Experiments on TREC data demonstrate that impact-based BM25 is as good as or better than the original term frequency-based BM25 in terms of retrieval effectiveness.