Space-Partitioning-Based Bulk-Loading for the NSP-Tree in Non-ordered Discrete Data Spaces

Authors:
Gang Qian;Hyun-Jeong Seok;Qiang Zhu;Sakti Pramanik
Affiliations:
Department of Computer Science, University of Central Oklahoma, Edmond, USA OK 73034;Department of Computer and Information Science, The University of Michigan, Dearborn, USA MI 48128;Department of Computer and Information Science, The University of Michigan, Dearborn, USA MI 48128;Department of Computer Science and Engineering, Michigan State University, East Lansing, USA MI 48824
Venue:
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Year:
2008

Citing 15
Cited 0

On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
A greedy algorithm for bulk loading R-trees

Proceedings of the 6th ACM international symposium on Advances in geographic information systems
Direct spatial search on pictorial databases using packed R-trees

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Novel Index Supporting High Volume Data Warehouse Insertion

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Evaluation of Generic Bulk Loading Techniques

Proceedings of the 27th International Conference on Very Large Data Bases
Client-Server Paradise

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Priority R-tree: a practically efficient and worst-case optimal R-tree

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces

ACM Transactions on Information Systems (TOIS)
Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach

ACM Transactions on Database Systems (TODS)
The ND-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bulk-loading the ND-tree in non-ordered discrete data spaces

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Properly-designed bulk-loading techniques are more efficient than the conventional tuple-loading method in constructing a multidimensional index tree for a large data set. Although a number of bulk-loading algorithms have been proposed in the literature, most of them were designed for continuous data spaces (CDS) and cannot be directly applied to non-ordered discrete data spaces (NDDS). In this paper, we present a new space-partitioning-based bulk-loading algorithm for the NSP-tree -- a multidimensional index tree recently developed for NDDSs . The algorithm constructs the target NSP-tree by repeatedly partitioning the underlying NDDS for a given data set until input vectors in every subspace can fit into a leaf node. Strategies to increase the efficiency of the algorithm, such as multi-way splitting, memory buffering and balanced space partitioning, are employed. Histograms that characterize the data distribution in a subspace are used to decide space partitions. Our experiments show that the proposed bulk-loading algorithm is more efficient than the tuple-loading algorithm and a popular generic bulk-loading algorithm that could be utilized to build the NSP-tree.