Space-Partitioning-Based Bulk-Loading for the NSP-Tree in Non-ordered Discrete Data Spaces

  • Authors:
  • Gang Qian;Hyun-Jeong Seok;Qiang Zhu;Sakti Pramanik

  • Affiliations:
  • Department of Computer Science, University of Central Oklahoma, Edmond, USA OK 73034;Department of Computer and Information Science, The University of Michigan, Dearborn, USA MI 48128;Department of Computer and Information Science, The University of Michigan, Dearborn, USA MI 48128;Department of Computer Science and Engineering, Michigan State University, East Lansing, USA MI 48824

  • Venue:
  • DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Properly-designed bulk-loading techniques are more efficient than the conventional tuple-loading method in constructing a multidimensional index tree for a large data set. Although a number of bulk-loading algorithms have been proposed in the literature, most of them were designed for continuous data spaces (CDS) and cannot be directly applied to non-ordered discrete data spaces (NDDS). In this paper, we present a new space-partitioning-based bulk-loading algorithm for the NSP-tree -- a multidimensional index tree recently developed for NDDSs . The algorithm constructs the target NSP-tree by repeatedly partitioning the underlying NDDS for a given data set until input vectors in every subspace can fit into a leaf node. Strategies to increase the efficiency of the algorithm, such as multi-way splitting, memory buffering and balanced space partitioning, are employed. Histograms that characterize the data distribution in a subspace are used to decide space partitions. Our experiments show that the proposed bulk-loading algorithm is more efficient than the tuple-loading algorithm and a popular generic bulk-loading algorithm that could be utilized to build the NSP-tree.