The list of clusters revisited

Authors:
Eric Sadit Tellez;Edgar Chávez
Affiliations:
Universidad Michoacana de San Nicolás de Hidalgo, México;Universidad Michoacana de San Nicolás de Hidalgo, México
Venue:
MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Year:
2012

Citing 10
Cited 0

An algorithm for finding nearest neighbours in (approximately) constant average time

Pattern Recognition Letters
A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
Some approaches to best-match file searching

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching

Multimedia Tools and Applications
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Solving similarity joins and range queries in metric spaces with the list of twin clusters

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most efficient index for similarity search, to fix ideas think in speeding up k-nn searches in a very large database, is the so called list of clusters. This data structure is a counterintuitive construction which can be seen as extremely unbalanced, as opposed to balanced data structures for exact searching. In practical terms there is no better alternative for exact indexing, when every search return all the incumbent results; as opposed to approximate similarity search. The major drawback of the list of clusters is its quadratic time construction. In this paper we revisit the list of clusters aiming at speeding up the construction time without sacrificing its efficiency. We obtain similar search times while gaining a significant amount of time in the construction phase.