Impact of the initialization in tree-based fast similarity search techniques

Authors:
Aureo Serrano;Luisa Micó;Jose Oncina
Affiliations:
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Venue:
SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Year:
2011

Citing 14
Cited 0

FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A fast branch & bound nearest neighbour classifier in metric spaces

Pattern Recognition Letters
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
The String-to-String Correction Problem

Journal of the ACM (JACM)
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
The choice of reference points in best-match file searching

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Monotonous Bisector* Trees - A Tool for Efficient Partitioning of Complex Scenes of Geometric Objects

Data Structures and Efficient Algorithms, Final Report on the DFG Special Joint Initiative
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Some approaches to improve tree-based nearest neighbour search algorithms

Pattern Recognition
A Data Structure and an Algorithm for the Nearest Point Problem

IEEE Transactions on Software Engineering
Dynamic spatial approximation trees

Journal of Experimental Algorithmics (JEA)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many fast similarity search techniques relies on the use of pivots (specially selected points in the data set). Using these points, specific structures (indexes) are built speeding up the search when queering. Usually, pivot selection techniques are incremental, being the first one randomly chosen. This article explores several techniques to choose the first pivot in a tree-based fast similarity search technique. We provide experimental results showing that an adequate choice of this pivot leads to significant reductions in distance computations and time complexity. Moreover, most pivot tree-based indexes emphasizes in building balanced trees.We provide experimentally and theoretical support that very unbalanced trees can be a better choice than balanced ones.