Bulk-loading the ND-tree in non-ordered discrete data spaces

  • Authors:
  • Hyun-Jeong Seok;Gang Qian;Qiang Zhu;Alexander R. Oswald;Sakti Pramanik

  • Affiliations:
  • Department of Computer and Information Science, The University of Michigan-Dearborn, Dearborn, MI;Department of Computer Science, University of Central Oklahoma, Edmond, OK;Department of Computer and Information Science, The University of Michigan-Dearborn, Dearborn, MI;Department of Computer Science, University of Central Oklahoma, Edmond, OK;Department of Computer Science and Engineering, Michigan State University, East Lansing, MI

  • Venue:
  • DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applications demanding multidimensional index structures for performing efficient similarity queries often involve a large amount of data. The conventional tuple-loading approach to building such an index structure for a large data set is inefficient. To overcome the problem, a number of algorithms to bulk-load the index structures, like the R-tree, from scratch for large data sets in continuous data spaces have been proposed. However, many of them cannot be directly applied to a non-ordered discrete data space (NDDS) where data values on each dimension are discrete and have no natural ordering. No bulk-loading algorithm has been developed specifically for an index structure, such as the ND-tree, in an NDDS. In this paper, we present a bulk-loading algorithm, called the NDTBL, for the ND-tree in NDDSs. It adopts a special in-memory structure to efficiently construct the target ND-tree. It utilizes and extends some operations in the original ND-tree tuple-loading algorithm to exploit the properties of an NDDS in choosing and splitting data sets/nodes during the bulk-loading process. It also employs some strategies such as multi-way splitting and memory buffering to enhance efficiency. Our experimental studies show that the presented algorithm is quite promising in bulk-loading the ND-tree for large data sets in NDDSs.