Slicing the metric space to provide quick indexing of complex data in the main memory

Authors:
Caio César Mori Carélo;Ives Renê Venturini Pola;Ricardo Rodrigues Ciferri;Agma Juci Machado Traina;Caetano Traina, Jr;Cristina Dutra de Aguiar Ciferri
Affiliations:
Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, Federal University of São Carlos, 13.565-905, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil
Venue:
Information Systems
Year:
2011

Citing 30
Cited 0

Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract)

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Some approaches to best-match file searching

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
How to improve the pruning ability of dynamic metric access methods

Proceedings of the eleventh international conference on Information and knowledge management
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees

IEEE Transactions on Knowledge and Data Engineering
A Parallel Similarity Search in High Dimensional Metric Space Using M-Tree

IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances

The VLDB Journal — The International Journal on Very Large Data Bases
M+-tree: a new dynamical multidimensional index for metric spaces

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient

The VLDB Journal — The International Journal on Very Large Data Bases
Zoned-partitioning of tree-like access methods

Information Systems
NM-Tree: Flexible Approximate Similarity Search in Metric and Non-metric Spaces

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Easing the Dimensionality Curse by Stretching Metric Spaces

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
The Onion-Tree: Quick Indexing of Complex Data in the Main Memory

ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
EGNAT: A Fully Dynamic Metric Access Method for Secondary Memory

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Efficient bulk-loading on dynamic metric access methods

Information Systems
The MM-tree: a memory-based metric tree without overlap between nodes

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Improving the performance of M-tree family by nearest-neighbor graphs

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
BM+-Tree: a hyperplane-based index method for high-dimensional metric spaces

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper, we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests.