Reducing the overhead of the AESA metric-space nearest neighbour searching algorithm
Information Processing Letters
A fast branch & bound nearest neighbour classifier in metric spaces
Pattern Recognition Letters
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Modern Information Retrieval
Searching in metric spaces with user-defined and approximate distances
ACM Transactions on Database Systems (TODS)
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces
Information Processing Letters
SIAM Journal on Discrete Mathematics
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Probabilistic proximity searching algorithms based on compact partitions
Journal of Discrete Algorithms - SPIRE 2002
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing
Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Engineering efficient metric indexes
Pattern Recognition Letters
t-Spanners for metric space searching
Data & Knowledge Engineering
Effective Proximity Retrieval by Ordering Permutations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search: A multi-faceted problem
Journal of Discrete Algorithms
A fast pivot-based indexing algorithm for metric spaces
Pattern Recognition Letters
Hi-index | 0.00 |
Proximity searching consists of retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speedup query processing. Among all the known indices, the baseline for performance for about 20 years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this article, we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESA-like algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60% of the distance evaluations of AESA in a database of documents, a very important and difficult real-life instance of the problem. For the probabilistic algorithm, we perform in a database of faces up to 40% of the comparisons made by the best alternative algorithm to retrieve the same percentage of the correct answer. Based on the empirical results, we conjecture that the new probabilistic AESA-like algorithms will become, as AESA had been for exact algorithms, a reference point establishing, in practice, a lower bound on how good a probabilistic proximity search algorithm can be.