Genetic algorithms for approximate similarity queries

  • Authors:
  • Renato Bueno;Agma J. M. Traina;Caetano Traina, Jr.

  • Affiliations:
  • Computer Science Department - ICMC, University of São Paulo at São Carlos - USP, Av. Trabalhador São-carlense, 400, 13560-970 São Carlos, SP, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos - USP, Av. Trabalhador São-carlense, 400, 13560-970 São Carlos, SP, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos - USP, Av. Trabalhador São-carlense, 400, 13560-970 São Carlos, SP, Brazil

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Algorithms to query large sets of simple data (composed of numbers and small character strings) are constructed to retrieve the exact answer, retrieving every relevant element, so the answer said to be exact. Similarity searching over complex data is much more expensive than searching over simple data. Moreover, comparison operations over complex data usually consider features extracted from each element, instead of the elements themselves. Thus, even if an algorithm retrieves an exact answer, it is 'exact' regarding the extracted features, not regarding the original elements themselves. Therefore, trading exact answering with query time response can be worthwhile. In this work we developed two search strategies based on genetic algorithms to allow retrieving approximate data indexed by Metric Access Methods (MAM) within a limited, user-defined, amount of time. These strategies allow implementing algorithms to answer both range and k-nearest neighbor queries, and allow also to estimate the precision obtained for the approximate answer. Experimental evaluation shows that very good results (corresponding to what the user would expect) can be obtained in a fraction of the time required to obtain the exact answer.