Speeding up spatial approximation search in metric spaces

Authors:
Karina Figueroa;Edgar Chavez;Gonzalo Navarro;Rodrigo Paredes
Affiliations:
Universidad Michoacana, Mexico;Universidad Michoacana/CICESE, Mexico;University of Chile, Chile;University of Chile, Chile
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
2010

Citing 17
Cited 1

Reducing the overhead of the AESA metric-space nearest neighbour searching algorithm

Information Processing Letters
A fast branch & bound nearest neighbour classifier in metric spaces

Pattern Recognition Letters
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Searching in metric spaces with user-defined and approximate distances

ACM Transactions on Database Systems (TODS)
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces

Information Processing Letters
Comparing Top k Lists

SIAM Journal on Discrete Mathematics
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Probabilistic proximity searching algorithms based on compact partitions

Journal of Discrete Algorithms - SPIRE 2002
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Engineering efficient metric indexes

Pattern Recognition Letters
t-Spanners for metric space searching

Data & Knowledge Engineering
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms

A fast pivot-based indexing algorithm for metric spaces

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Proximity searching consists of retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speedup query processing. Among all the known indices, the baseline for performance for about 20 years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this article, we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESA-like algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60% of the distance evaluations of AESA in a database of documents, a very important and difficult real-life instance of the problem. For the probabilistic algorithm, we perform in a database of faces up to 40% of the comparisons made by the best alternative algorithm to retrieve the same percentage of the correct answer. Based on the empirical results, we conjecture that the new probabilistic AESA-like algorithms will become, as AESA had been for exact algorithms, a reference point establishing, in practice, a lower bound on how good a probabilistic proximity search algorithm can be.