Succinct nearest neighbor search

Authors:
Eric Sadit Tellez;Edgar Chávez;Gonzalo Navarro
Affiliations:
Universidad Michoacana de San Nicolás de Hidalgo, México;Universidad Michoacana de San Nicolás de Hidalgo, México;University of Chile, Chile
Venue:
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Year:
2011

Citing 18
Cited 2

The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Adaptive intersection and t-threshold problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval

Modern Information Retrieval
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
2008 Special Issue: An axiomatic approach to intrinsic dimension of a dataset

Neural Networks
Dynamic entropy-compressed sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Speeding Up Permutation Based Indexing with Indexing

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
A Brief Index for Proximity Searching

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
On locality sensitive hashing in metric spaces

Proceedings of the Third International Conference on SImilarity Search and APplications
Scalable pattern search analysis

MCPR'11 Proceedings of the Third Mexican conference on Pattern recognition

Polyphasic metric index: reaching the practical limits of proximity searching

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Automatic monitoring the content of audio broadcasted by internet radio stations

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a novel technique for nearest neighbor searching dubbed neighborhood approximation. The central idea is to divide the database into compact regions represented by a single object, called the reference. To search for nearest neighbors a set of candidate references is first obtained and later enriched with the database objects associated to those references. This approach can be implemented with an inverted index, which in turn can be represented in a succinct way, spending just a few bits per object. As a consequence it is possible to store the index in main memory, even for relatively large databases. The speed/compression/recall tradeoff achieved is excellent. To obtain 92% recall in 30-nearest neighbors searches the index reviews less than 0.6% of the database, in time ranging from 0.35 to 2.67 seconds using from 93 to 24 Mbytes for a ten million objects database. The tradeoff comes from using different compression techniques. The uncompressed index requires 0.17 seconds and 267 Mbytes of space. A quality measure complementary to the recall is the ratio between the covering radius of the actual nearest neighbors and the near neighbors reported by the algorithm. Using this measure our results are within a small constant compared to the exact results.