Document Length Normalization

Authors:
Amit Singhal;Gerard Salton;Mandar Mitra;Chris Buckley
Affiliations:
-;-;-;-
Venue:
Document Length Normalization
Year:
1995

Citing 0
Cited 6

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Improvement of HITS-based algorithms on web documents

Proceedings of the 11th international conference on World Wide Web
Knowledge Discovery in an Earthquake Text Database: Correlation between Significant Earthquakes and the Time of Day

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Challenges in enterprise search

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Concept-Based Information Retrieval Using Explicit Semantic Analysis

ACM Transactions on Information Systems (TOIS)
Nonlinear transformation of term frequencies for term weighting in text categorization

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the TREC collection -a large full-text experimental text collection with widely varying document lengths -we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.