Clustering the topics using TF-IDF for model fusion

Authors:
Muath Alzghool;Diana Inkpen
Affiliations:
University of Ottawa, Ottawa, ON, Canada;University of Ottawa, Ottawa, ON, Canada
Venue:
Proceedings of the 2nd PhD workshop on Information and knowledge management
Year:
2008

Citing 3
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieving spoken documents by combining multiple index sources

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval

Challenging research issues in data mining, databases and information retrieval

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Users tend to express their queries in various ways: sometimes they use more general terms, sometimes more specific terms. Information retrieval systems need to be able to accommodate this variety of user needs. Some retrieval models perform better when the queries are general, others perform better when the queries are more specific, and others when a combination is available. In this paper we are looking for a system that will perform well in all these cases, we present a new method for combining the results of different models in order to improve the performance on a difficult task: Information Retrieval from spontaneous speech. Our technique is based on clustering the training topics according to their tf-idf (term frequency-inverse document frequency) properties, and selecting the best models for each cluster. When the system runs on a test topic, the cluster of the topic needs to be determined and the combination of models of this cluster is used. We report improvements on the Malach collection used at CLEF-CLSR 2007.