Using genetic algorithms to evolve a population of topical queries

  • Authors:
  • Rocío L. Cecchini;Carlos M. Lorenzetti;Ana G. Maguitman;Nélida Beatríz Brignole

  • Affiliations:
  • LIDeCC - Laboratorio de Investigación y Desarrollo en Computación Científica, Universidad Nacional del Sur, Av. Alem 1253, (8000) Bahía Blanca, Argentina and Departamento de Ci ...;LIDIA - Laboratorio de Investigación y Desarrollo en Inteligencia Artificial, Universidad Nacional del Sur, Av. Alem 1253, (8000) Bahía Blanca, Argentina and Departamento de Ciencias e I ...;LIDIA - Laboratorio de Investigación y Desarrollo en Inteligencia Artificial, Universidad Nacional del Sur, Av. Alem 1253, (8000) Bahía Blanca, Argentina and Departamento de Ciencias e I ...;LIDeCC - Laboratorio de Investigación y Desarrollo en Computación Científica, Universidad Nacional del Sur, Av. Alem 1253, (8000) Bahía Blanca, Argentina and Departamento de Ci ...

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Systems for searching the Web based on thematic contexts can be built on top of a conventional search engine and benefit from the huge amount of content as well as from the functionality available through the search engine interface. The quality of the material collected by such systems is highly dependant on the vocabulary used to generate the search queries. In this scenario, selecting good query terms can be seen as an optimization problem where the objective function to be optimized is based on the effectiveness of a query to retrieve relevant material. Some characteristics of this optimization problem are: (1) the high-dimensionality of the search space, where candidate solutions are queries and each term corresponds to a different dimension, (2) the existence of acceptable suboptimal solutions, (3) the possibility of finding multiple solutions, and in many cases (4) the quest for novelty. This article describes optimization techniques based on Genetic Algorithms to evolve ''good query terms'' in the context of a given topic. The proposed techniques place emphasis on searching for novel material that is related to the search context. We discuss the use of a mutation pool to allow the generation of queries with new terms, study the effect of different mutation rates on the exploration of query-space, and discuss the use of a especially developed fitness function that favors the construction of queries containing novel but related terms.