Slicing the metric space to provide quick indexing of complex data in the main memory

  • Authors:
  • Caio César Mori Carélo;Ives Renê Venturini Pola;Ricardo Rodrigues Ciferri;Agma Juci Machado Traina;Caetano Traina, Jr;Cristina Dutra de Aguiar Ciferri

  • Affiliations:
  • Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, Federal University of São Carlos, 13.565-905, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil;Department of Computer Science, University of São Paulo at São Carlos, 13.560-970, São Carlos, SP, Brazil

  • Venue:
  • Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper, we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests.