The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

  • Authors:
  • Affiliations:
  • Venue:
  • ICDE '99 Proceedings of the 15th International Conference on Data Engineering
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree -- a multidimensional data structure for indexing high dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (e.g., R-tree, SS-tree, SR-tree) or a pure space partitioning (SP) one (e.g., KDB-tree, hB-tree); rather, it ``combines'' positive aspects of the two types of index structures a single data structure to achieve search performance more scalable to high dimensionalities than either of the above techniques (hence, the name ``hybrid''). Furthermore, unlike many data structures (e.g., distance based index structures like SS-tree, SR-tree), the hybrid tree can support queries based on arbitrary dist ance functions. Our experiments on ``real'' high dimensional large size feature databases demonstrate that the hybrid tree scal es well to high dimensionality and large database sizes. It significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scan at all dimensionalities for large sized databases.