Evolving local and global weighting schemes in information retrieval

  • Authors:
  • Ronan Cummins;Colm O'Riordan

  • Affiliations:
  • Department of Information Technology, National University of Ireland, Galway, Ireland;Department of Information Technology, National University of Ireland, Galway, Ireland

  • Venue:
  • Information Retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a method, using Genetic Programming, to automatically determine term weighting schemes for the vector space model. Based on a set of queries and their human determined relevant documents, weighting schemes are evolved which achieve a high average precision. In Information Retrieval (IR) systems, useful information for term weighting schemes is available from the query, individual documents and the collection as a whole.We evolve term weighting schemes in both local (within-document) and global (collection-wide) domains which interact with each other correctly to achieve a high average precision. These weighting schemes are tested on well-known test collections and are compared to the traditional tf-idf weighting scheme and to the BM25 weighting scheme using standard IR performance metrics.Furthermore, we show that the global weighting schemes evolved on small collections also increase average precision on larger TREC data. These global weighting schemes are shown to adhere to Luhn's resolving power as both high and low frequency terms are assigned low weights. However, the local weightings evolved on small collections do not perform as well on large collections. We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on large collections.