Clustering the topics using TF-IDF for model fusion

  • Authors:
  • Muath Alzghool;Diana Inkpen

  • Affiliations:
  • University of Ottawa, Ottawa, ON, Canada;University of Ottawa, Ottawa, ON, Canada

  • Venue:
  • Proceedings of the 2nd PhD workshop on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Users tend to express their queries in various ways: sometimes they use more general terms, sometimes more specific terms. Information retrieval systems need to be able to accommodate this variety of user needs. Some retrieval models perform better when the queries are general, others perform better when the queries are more specific, and others when a combination is available. In this paper we are looking for a system that will perform well in all these cases, we present a new method for combining the results of different models in order to improve the performance on a difficult task: Information Retrieval from spontaneous speech. Our technique is based on clustering the training topics according to their tf-idf (term frequency-inverse document frequency) properties, and selecting the best models for each cluster. When the system runs on a test topic, the cluster of the topic needs to be determined and the combination of models of this cluster is used. We report improvements on the Malach collection used at CLEF-CLSR 2007.