Learn to weight terms in information retrieval using category information

  • Authors:
  • Rong Jin;Joyce Y. Chai;Luo Si

  • Affiliations:
  • Michigan State University, MI;Michigan State University, MI;Carnegie Mellon University, MI

  • Venue:
  • ICML '05 Proceedings of the 22nd international conference on Machine learning
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

How to assign appropriate weights to terms is one of the critical issues in information retrieval. Many term weighting schemes are unsupervised. They are either based on the empirical observation in information retrieval, or based on generative approaches for language modeling. As a result, the existing term weighting schemes are usually insufficient in distinguishing informative words from the uninformative ones, which is crucial to the performance of information retrieval. In this paper, we present supervised term weighting schemes that automatically learn term weights based on the correlation between word frequency and category information of documents. Empirical studies with the ImageCLEF dataset have indicated that the proposed methods perform substantially better than the state-of-the-art approaches for term weighting and other alternatives that exploit category information for information retrieval.