A fast branch & bound nearest neighbour classifier in metric spaces
Pattern Recognition Letters
ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Spaghettis: An Array Based Algorithm for Similarity Queries in Metric Spaces
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
The VLDB Journal — The International Journal on Very Large Data Bases
A Dynamic Pivot Selection Technique for Similarity Search
SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces
SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
New dynamic construction techniques for M-tree
Journal of Discrete Algorithms
Analyzing Metric Space Indexes: What For?
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Hi-index | 0.00 |
The pivot tables are a popular metric access method, primarily designed as a main-memory index structure. It has been many times proven that pivot tables are very efficient in terms of distance computations, hence, when assuming a computationally expensive distance function. However, for cheaper distance functions and/or huge datasets exceeding the capacity of the main memory, the classic pivot tables become inefficient. The situation is dramatically changing with the rise of solid state disks that decrease the seek times, so we can now efficiently access also small fragments of data stored in the secondary memory. In this paper, we propose a persistent variant of pivot tables, the clustered pivot tables, focusing on minimizing I/O cost when accessing small data blocks (a few kilobytes). The clustered pivot tables employs a preprocessing method utilizing the M-tree in the role of clustering technique and an original heuristic for I/O-optimized kNN query processing. In the experiments we empirically show that our proposed method significantly reduces the number of necessary I/O operations during query processing.