Proximity searching in high dimensional spaces with a proximity preserving order

Authors:
Edgar Chávez;Karina Figueroa;Gonzalo Navarro
Affiliations:
Facultad de Ciencias Físico-Matemáticas, Universidad Michoacana, México;,Facultad de Ciencias Físico-Matemáticas, Universidad Michoacana, México;Center for Web Research, Dept. of Computer Science, University of Chile
Venue:
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Year:
2005

Citing 13
Cited 5

An algorithm for finding nearest neighbours in (approximately) constant average time

Pattern Recognition Letters
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Searching in metric spaces

ACM Computing Surveys (CSUR)
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching

Multimedia Tools and Applications
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces

Information Processing Letters
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Searching in metric spaces by spatial approximation

The VLDB Journal — The International Journal on Very Large Data Bases
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Comparing Top k Lists

SIAM Journal on Discrete Mathematics
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Probabilistic proximity searching algorithms based on compact partitions

Journal of Discrete Algorithms - SPIRE 2002
A compact space decomposition for effective metric indexing

Pattern Recognition Letters

t-Spanners for metric space searching

Data & Knowledge Engineering
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Stabilizing the recall in similarity search

Proceedings of the Fourth International Conference on SImilarity Search and APplications
On the least cost for proximity searching in metric spaces

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Compact and efficient permutations for proximity searching

MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Kernel based methods (such as k-nearest neighbors classifiers) for AI tasks translate the classification problem into a proximity search problem, in a space that is usually very high dimensional. Unfortunately, no proximity search algorithm does well in high dimensions. An alternative to overcome this problem is the use of approximate and probabilistic algorithms, which trade time for accuracy. In this paper we present a new probabilistic proximity search algorithm. Its main idea is to order a set of samples based on their distance to each element. It turns out that the closeness between the order produced by an element and that produced by the query is an excellent predictor of the relevance of the element to answer the query. The performance of our method is unparalleled. For example, for a full 128-dimensional dataset, it is enough to review 10% of the database to obtain 90% of the answers, and to review less than 1% to get 80% of the correct answers. The result is more impressive if we realize that a full 128-dimensional dataset may span thousands of dimensions of clustered data. Furthermore, the concept of proximity preserving order opens a totally new approach for both exact and approximated proximity searching.