Learn to weight terms in information retrieval using category information

Authors:
Rong Jin;Joyce Y. Chai;Luo Si
Affiliations:
Michigan State University, MI;Michigan State University, MI;Carnegie Mellon University, MI
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 13
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

A language modeling approach to information retrieval
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Title language model for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage language models for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Probabilistic models of indexing and searching

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval

Efficient set-correlation operator inside databases

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Features' weight learning towards improved query classification

AIS'12 Proceedings of the Third international conference on Autonomous and Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

How to assign appropriate weights to terms is one of the critical issues in information retrieval. Many term weighting schemes are unsupervised. They are either based on the empirical observation in information retrieval, or based on generative approaches for language modeling. As a result, the existing term weighting schemes are usually insufficient in distinguishing informative words from the uninformative ones, which is crucial to the performance of information retrieval. In this paper, we present supervised term weighting schemes that automatically learn term weights based on the correlation between word frequency and category information of documents. Empirical studies with the ImageCLEF dataset have indicated that the proposed methods perform substantially better than the state-of-the-art approaches for term weighting and other alternatives that exploit category information for information retrieval.