PL-Tree: an efficient indexing method for high-dimensional data

  • Authors:
  • Jie Wang;Jian Lu;Zheng Fang;Tingjian Ge;Cindy Chen

  • Affiliations:
  • Department of Computer Science, University of Massachusetts Lowell, Lowell, MA;Department of Computer Science, University of Massachusetts Lowell, Lowell, MA;Department of Computer Science, University of Massachusetts Lowell, Lowell, MA;Department of Computer Science, University of Massachusetts Lowell, Lowell, MA;Department of Computer Science, University of Massachusetts Lowell, Lowell, MA

  • Venue:
  • SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The quest for processing data in high-dimensional space has resulted in a number of innovative indexing mechanisms. Choosing an appropriate indexing method for a given set of data requires careful consideration of data properties, data construction methods, and query types. We present a new indexing method to support efficient point queries, range queries, and k-nearest neighbor queries. Our method indexes objects dynamically using algebraic techniques, and it can substantially reduce the negative impacts of the "curse of dimensionality". In particular, our method partitions the data space recursively into hypercubes of certain capacity and labels each hypercube using the Cantor pairing function, so that all objects in the same hypercube have the same label. The bijective property and the computational efficiency of the Cantor pairing function make it possible to efficiently map between high-dimensional vectors and scalar labels. The partitioning and labeling process splits a subspace if the data items contained in it exceed its capacity. From the data structure point of view, our method constructs a tree where each parent node contains a number of labels and child pointers, and we call it a PL-tree . We compare our method with popular indexing algorithms including R*-tree, X-tree, quad-tree, and iDistance. Our numerical results show that the dynamic PL-tree indexing significantly outperforms the existing indexing mechanisms.