The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Adaptive intersection and t-threshold problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Effective Proximity Retrieval by Ordering Permutations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files
Proceedings of the 3rd international conference on Scalable information systems
Approximate similarity search: A multi-faceted problem
Journal of Discrete Algorithms
Speeding Up Permutation Based Indexing with Indexing
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
A Brief Index for Proximity Searching
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
On locality sensitive hashing in metric spaces
Proceedings of the Third International Conference on SImilarity Search and APplications
Scalable pattern search analysis
MCPR'11 Proceedings of the Third Mexican conference on Pattern recognition
Polyphasic metric index: reaching the practical limits of proximity searching
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Automatic monitoring the content of audio broadcasted by internet radio stations
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Hi-index | 0.00 |
In this paper we present a novel technique for nearest neighbor searching dubbed neighborhood approximation. The central idea is to divide the database into compact regions represented by a single object, called the reference. To search for nearest neighbors a set of candidate references is first obtained and later enriched with the database objects associated to those references. This approach can be implemented with an inverted index, which in turn can be represented in a succinct way, spending just a few bits per object. As a consequence it is possible to store the index in main memory, even for relatively large databases. The speed/compression/recall tradeoff achieved is excellent. To obtain 92% recall in 30-nearest neighbors searches the index reviews less than 0.6% of the database, in time ranging from 0.35 to 2.67 seconds using from 93 to 24 Mbytes for a ten million objects database. The tradeoff comes from using different compression techniques. The uncompressed index requires 0.17 seconds and 267 Mbytes of space. A quality measure complementary to the recall is the ratio between the covering radius of the actual nearest neighbors and the near neighbors reported by the algorithm. Using this measure our results are within a small constant compared to the exact results.