Learning retrieval expert combinations with genetic algorithms

Authors:
Holger Billhardt;Daniel Borrajo;Victor Maojo
Affiliations:
Departamento de Ciencias Experimentales e Ingeniería, Universidad Rey Juan Carlos, 28933 Móstoles, Madrid, Spain;Departamento de Informática, Universidad Carlos III, 28911 Leganés, Madrid, Spain;Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain
Venue:
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Year:
2003

Citing 16
Cited 2

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
The effect multiple query representations on information retrieval system performance

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic combination of multiple ranked retrieval systems

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the evidence of multiple query representations for information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms

Journal of the American Society for Information Science
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting the performance of linearly combined IR systems

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Crossover improvement for the genetic algorithm in information retrieval

Information Processing and Management: an International Journal
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A unified environment for fusion of information retrieval approaches

Proceedings of the eighth international conference on Information and knowledge management
A context vector model for information retrieval

Journal of the American Society for Information Science and Technology
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Fusion Via a Linear Combination of Scores

Information Retrieval
Genetic Approach to Query Space Exploration

Information Retrieval

A Comparison of Genetic Algorithms for Optimizing Linguistically Informed IR in Question Answering

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Genetic-based approaches in ranking function discovery and optimization in information retrieval - A framework

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of information retrieval (IR) is to provide models and systems that help users to identify the relevant documents to their information needs. Extensive research has been carried out to develop retrieval methods that solve this goal. These IR techniques range from purely syntax-based, considering only frequencies of words, to more semantics-aware approaches. However, it seems clear that there is no single method that works equally well on all collections and for all queries. Prior work suggests that combining the evidence from multiple retrieval experts can achieve significant improvements in retrieval effectiveness. A common problem of expert combination approaches is the selection of both the experts to be combined and the combination function. In most studies the experts are selected from a rather small set of candidates using some heuristics. Thus, only a reduced number of possible combinations is considered and other possibly better solutions are left out. In this paper we propose the use of genetic algorithms to find a suboptimal combination of experts for a document collection at hand. Our approach automatically determines both the experts to be combined and the parameters of the combination function. Because we learn this combination for each specific document collection, this approach allows us to automatically adjust the IR system to specific user needs. To learn retrieval strategies that generalize well on new queries we propose a fitness function that is based on the statistical significance of the average precision obtained on a set of training queries. We test and evaluate the approach on four classical text collections. The results show that the learned combination strategies perform better than any of the individual methods and that genetic algorithms provide a viable method to learn expert combinations. The experiments also evaluate the use of a semantic indexing approach, the context vector model, in combination with classical word matching techniques.