Word document density and relevance scoring (poster session)

  • Authors:
  • Martin Franz;J. Scott McCarley

  • Affiliations:
  • IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY;IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY

  • Venue:
  • SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2000

Quantified Score

Hi-index 0.03

Visualization

Abstract

Previous work addressing the issue of word distribution in documents has shown the importance of Word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency of words to repeat within a document to separate the words with similar document frequencies, but different topic discriminating characteristics. We describe the application of the new measure in query-document relevance scoring. Experiments on the TREC Ad Hoc and Spoken Document Retrieval tasks [7] show useful performance improvements.