Clustered pivot tables for I/O-optimized similarity search

Authors:
Juraj Moško;Jakub Lokoč;Tomáš Skopal
Affiliations:
Charles University in Prague;Charles University in Prague;Charles University in Prague
Venue:
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Year:
2011

Citing 13
Cited 0

A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
A fast branch & bound nearest neighbour classifier in metric spaces

Pattern Recognition Letters
Searching in metric spaces

ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Spaghettis: An Array Based Algorithm for Similarity Queries in Metric Spaces

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient

The VLDB Journal — The International Journal on Very Large Data Bases
A Dynamic Pivot Selection Technique for Similarity Search

SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
On Reinsertions in M-tree

SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
New dynamic construction techniques for M-tree

Journal of Discrete Algorithms
Analyzing Metric Space Indexes: What For?

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The pivot tables are a popular metric access method, primarily designed as a main-memory index structure. It has been many times proven that pivot tables are very efficient in terms of distance computations, hence, when assuming a computationally expensive distance function. However, for cheaper distance functions and/or huge datasets exceeding the capacity of the main memory, the classic pivot tables become inefficient. The situation is dramatically changing with the rise of solid state disks that decrease the seek times, so we can now efficiently access also small fragments of data stored in the secondary memory. In this paper, we propose a persistent variant of pivot tables, the clustered pivot tables, focusing on minimizing I/O cost when accessing small data blocks (a few kilobytes). The clustered pivot tables employs a preprocessing method utilizing the M-tree in the role of clustering technique and an original heuristic for I/O-optimized kNN query processing. In the experiments we empirically show that our proposed method significantly reduces the number of necessary I/O operations during query processing.