Document Length Normalization

  • Authors:
  • Amit Singhal;Gerard Salton;Mandar Mitra;Chris Buckley

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Document Length Normalization
  • Year:
  • 1995

Quantified Score

Hi-index 0.01

Visualization

Abstract

In the TREC collection -a large full-text experimental text collection with widely varying document lengths -we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.