On the least cost for proximity searching in metric spaces

Authors:
Karina Figueroa;Edgar Chávez;Gonzalo Navarro;Rodrigo Paredes
Affiliations:
Universidad Michoacana, México;Universidad Michoacana, México;Center for Web Research, Dept. of Computer Science, Universidad de Chile;Center for Web Research, Dept. of Computer Science, Universidad de Chile
Venue:
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Year:
2006

Citing 14
Cited 12

An algorithm for finding nearest neighbours in (approximately) constant average time

Pattern Recognition Letters
A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
Reducing the overhead of the AESA metric-space nearest neighbour searching algorithm

Information Processing Letters
A fast branch & bound nearest neighbour classifier in metric spaces

Pattern Recognition Letters
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Searching in metric spaces with user-defined and approximate distances

ACM Transactions on Database Systems (TODS)
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces

Information Processing Letters
t-Spanners as a Data Structure for Metric Space Searching

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Comparing Top k Lists

SIAM Journal on Discrete Mathematics
Probabilistic proximity searching algorithms based on compact partitions

Journal of Discrete Algorithms - SPIRE 2002
Proximity searching in high dimensional spaces with a proximity preserving order

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence

CM-tree: A dynamic clustered index for similarity search in metric databases

Data & Knowledge Engineering
Fast k Most Similar Neighbor Classifier for Mixed Data Based on a Tree Structure and Approximating-Eliminating

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
Parallel query processing on distributed clustering indexes

Journal of Discrete Algorithms
Solving similarity joins and range queries in metric spaces with the list of twin clusters

Journal of Discrete Algorithms
Fast error-tolerant search on very large texts

Proceedings of the 2009 ACM symposium on Applied Computing
Fast k most similar neighbor classifier for mixed data (tree k-MSN)

Pattern Recognition
Simple space-time trade-offs for AESA

WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Fast k most similar neighbor classifier for mixed data based on approximating and eliminating

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Ptolemaic indexing of the signature quadratic form distance

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Versatile probability-based indexing for approximate similarity search

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Dynamic optimization of queries in pivot-based indexing

Multimedia Tools and Applications
Efficient fuzzy search in large text collections

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Proximity searching consists in retrieving from a database those elements that are similar to a query. As the distance is usually expensive to compute, the goal is to use as few distance computations as possible to satisfy queries. Indexes use precomputed distances among database elements to speed up queries. As such, a baseline is AESA, which stores all the distances among database objects, but has been unbeaten in query performance for 20 years. In this paper we show that it is possible to improve upon AESA by using a radically different method to select promising database elements to compare against the query. Our experiments show improvements of up to 75% in document databases. We also explore the usage of our method as a probabilistic algorithm that may lose relevant answers. On a database of faces where any exact algorithm must examine virtually all elements, our probabilistic version obtains 85% of the correct answers by scanning only 10% of the database.