A distance-based packing method for high dimensional data

Authors:
Tae-Wan Kim;Ki-Joune Li
Affiliations:
Department of Computer Science, Pusan National University, San-30, Jang-Jun Dong, Kum-Jung Gu, Pusan, Korea;Department of Computer Science, Pusan National University, San-30, Jang-Jun Dong, Kum-Jung Gu, Pusan, Korea
Venue:
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Year:
2003

Citing 19
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Towards an analysis of range query performance in spatial data structures

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
An introduction to disk drive modeling

Computer
Window query-optimal clustering of spatial objects

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Speeding up bulk-loading of quadtrees

GIS '97 Proceedings of the 5th ACM international workshop on Advances in geographic information systems
Direct spatial search on pictorial databases using packed R-trees

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Improved bulk-loading algorithms for quadtrees

Proceedings of the 7th ACM international symposium on Advances in geographic information systems
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
The Effect of Buffering on the Performance of R-Trees

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
EFFICIENT BULK-LOADING OF GRIDFILES

EFFICIENT BULK-LOADING OF GRIDFILES

Quantified Score

Hi-index	0.00

Visualization

Abstract

Minkowski-sum cost model indicates that balanced data pattitioning is not beneficial for high dimensional data. Thus we study several unbalanced partitioning methods and propose cost models for them based on Minkowski-sum cost model. Our cost models indicate that the distance to one of both ends of data space dominates the expected value under uniform data distribution. We generalize studied methods to adapt to data distribution and propose a new partitioning method, called DD-CSP (Distance-based Distribution-adaptive Cyclic Sliced Partition), for high-dimensional index structures. At each partition, it splits data from lower end or higher end to the center of data space based on distance cost function. Based on this fact, we propose a data structure called DSR(Dimension-independent Single value Representation) which takes constant amount of storage to represent MBHs(Minimum Bounding Hyper-cubes) independent of dimension.In our experimental studies, we compare DD-CSP with R-tree, HP, STR, TGS, and methods analyzed in our paper on real and synthetic data sets with wide ranges of dimensions and of selectivities varying from 10-1 to 10-9. In all experiments, we show that our method, DD-CSP, outperforms all other methods and achieves up to 567% savings in response time. Thus it is a clearly winning strategy in terms of range queries and storage requirements.