Genetic algorithms for approximate similarity queries

Authors:
Renato Bueno;Agma J. M. Traina;Caetano Traina, Jr.
Affiliations:
Computer Science Department - ICMC, University of São Paulo at São Carlos - USP, Av. Trabalhador São-carlense, 400, 13560-970 São Carlos, SP, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos - USP, Av. Trabalhador São-carlense, 400, 13560-970 São Carlos, SP, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos - USP, Av. Trabalhador São-carlense, 400, 13560-970 São Carlos, SP, Brazil
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 25
Cited 4

Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Selectivity estimation in spatial databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Some approaches to best-match file searching

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Perceptual Metrics for Image Database Navigation

Perceptual Metrics for Image Database Navigation
Efficient Cost Models for Spatial Queries Using R-Trees

IEEE Transactions on Knowledge and Data Engineering
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees

IEEE Transactions on Knowledge and Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate similarity retrieval with M-trees

The VLDB Journal — The International Journal on Very Large Data Bases
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Sampling-Based Estimator for Top-k Query

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An analysis of the behavior of a class of genetic adaptive systems.

An analysis of the behavior of a class of genetic adaptive systems.
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Probabilistic proximity searching algorithms based on compact partitions

Journal of Discrete Algorithms - SPIRE 2002
An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases

Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Automated multi-label text categorization with VG-RAM weightless neural networks

Neurocomputing
Effective retrieval and new indexing method for case based reasoning: Application in chemical process design

Engineering Applications of Artificial Intelligence
The use of a genetic algorithm for clustering the weighing station performance in transportation - A case study

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Algorithms to query large sets of simple data (composed of numbers and small character strings) are constructed to retrieve the exact answer, retrieving every relevant element, so the answer said to be exact. Similarity searching over complex data is much more expensive than searching over simple data. Moreover, comparison operations over complex data usually consider features extracted from each element, instead of the elements themselves. Thus, even if an algorithm retrieves an exact answer, it is 'exact' regarding the extracted features, not regarding the original elements themselves. Therefore, trading exact answering with query time response can be worthwhile. In this work we developed two search strategies based on genetic algorithms to allow retrieving approximate data indexed by Metric Access Methods (MAM) within a limited, user-defined, amount of time. These strategies allow implementing algorithms to answer both range and k-nearest neighbor queries, and allow also to estimate the precision obtained for the approximate answer. Experimental evaluation shows that very good results (corresponding to what the user would expect) can be obtained in a fraction of the time required to obtain the exact answer.