A combined component approach for finding collection-adapted ranking functions based on genetic programming

  • Authors:
  • Humberto Mossri de Almeida;Marcos André Gonçalves;Marco Cristo;Pável Calado

  • Affiliations:
  • Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;FUCAPI - Analysis, Research and Tech. Innovation Center, Manaus, Brazil;IST/INESC-ID, Lisboa, Portugal

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA)is based on the combination of several term-weighting components (i.e.,term frequency, collection frequency, normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in our CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that our approach was able to outper form standard TF-IDF, BM25 and another GP-based approach in two different collections. CCA obtained improvements in mean average precision up to 40.87% for the TREC-8 collection, and 24.85% for the WBR99 collection (a large Brazilian Web collection), over the baseline functions. The CCA evolution process also was able to reduce the overtraining, commonly found in machine learning methods, especially genetic programming, and to converge faster than the other GP-based approach used for comparison.