Polyphasic metric index: reaching the practical limits of proximity searching

Authors:
Eric Sadit Tellez;Edgar Chavez;Karina Figueroa
Affiliations:
Universidad Michoacana de San Nicolás de Hidalgo, México;Universidad Michoacana de San Nicolás de Hidalgo, México;Universidad Michoacana de San Nicolás de Hidalgo, México
Venue:
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Year:
2012

Citing 21
Cited 0

A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Adaptive intersection and t-threshold problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Region proximity in metric spaces and its use for approximate similarity search

ACM Transactions on Information Systems (TOIS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
2008 Special Issue: An axiomatic approach to intrinsic dimension of a dataset

Neural Networks
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Indexability, concentration, and VC theory

Proceedings of the Third International Conference on SImilarity Search and APplications
Where are you heading, metric access methods?: a provocative survey

Proceedings of the Third International Conference on SImilarity Search and APplications
Succinct nearest neighbor search

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Stabilizing the recall in similarity search

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Scalable pattern search analysis

MCPR'11 Proceedings of the Third Mexican conference on Pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Some metric indexes, like the pivot based family, can natively trade space for query time. Other indexes may have a small memory footprint and still outperform the pivot based approach; but are unable to increase the memory usage to boost the query time. In this paper we propose a new metric indexing technique with an algorithmic mechanism to lift the performance of otherwise rigid metric indexes. We selected the well known List of Clusters (LC) as the base data structure, obtaining an index which is orders of magnitude faster to build, with memory usage adaptable to the intrinsic dimension of the data, and faster at query time than the original LC. We also present a nearest neighbor algorithm, of independent interest, which is optimal in the sense that requires the same number of distance computations as a range query with the radius of the nearest neighbor. We present exhaustive experimental evidence supporting our claims, for both synthetic and real world datasets.