Data space mapping for efficient I/O in large multi-dimensional databases

Authors:
Hakan Ferhatosmanoglu;Aravind Ramachandran;Divyakant Agrawal;Amr El Abbadi
Affiliations:
Computer Science and Engineering, Ohio State University, USA;Microsoft;Computer Science, University of California, Santa Barbara, USA;Computer Science, University of California, Santa Barbara, USA
Venue:
Information Systems
Year:
2007

Citing 38
Cited 0

An application of number theory to the organization of raster-graphics memory

Journal of the ACM (JACM) - The MIT Press scientific computation series
Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Gray Codes for Partial Match and Range Queries

IEEE Transactions on Software Engineering
The design and analysis of spatial data structures

The design and analysis of spatial data structures
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A performance analysis of alternative multi-attribute declustering strategies

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Parallel R-trees

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Optimal disk allocation for partial match queries

ACM Transactions on Database Systems (TODS)
Partitioning similarity graphs: a framework for declustering problems

Information Systems
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient disk allocation for fast similarity searching

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Multidimensional access methods

ACM Computing Surveys (CSUR)
Active disks: programming model, algorithms and evaluation

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Asilomar report on database research

ACM SIGMOD Record
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
Clustering declustered data for efficient retrieval

Proceedings of the eighth international conference on Information and knowledge management
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Declustering and Load-Balancing Methods for Parallelizing Geographic Information Systems

IEEE Transactions on Knowledge and Data Engineering
Latin Squares for Parallel Array Access

IEEE Transactions on Parallel and Distributed Systems
Cyclic Allocation of Two-Dimensional Data

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Multiuser Performance Analysis of Alternative Declustering Strategies

Proceedings of the Sixth International Conference on Data Engineering
Optimal Allocation of Two-Dimensional Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Study of Scalable Declustering Algorithms for Parallel Grid Files

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Latin Cubes and Parallel Array Access

Proceedings of the 8th International Symposium on Parallel Processing
Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Declustering Objects for Visualization

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Dynamic Declustering Methods for Parallel Grid Files

Proceedings of the Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O: Parallel Computation
A General Multidimensional Data Allocation Method for Multicomputer Database Systems

DEXA '97 Proceedings of the 8th International Conference on Database and Expert Systems Applications
Concentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Declustering Using Golden Ratio Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Replicated declustering of spatial data

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose data space mapping techniques for storage and retrieval in multi-dimensional databases on multi-disk architectures. We identify the important factors for an efficient multi-disk searching of multi-dimensional data and develop secondary storage organization and retrieval techniques that directly address these factors. We especially focus on high dimensional data, where none of the current approaches are effective. In contrast to the current declustering techniques, storage techniques in this paper consider both inter- and intra-disk organization of the data. The data space is first partitioned into buckets, then the buckets are declustered to multiple disks while they are clustered in each disk. The queries are executed through bucket identification techniques that locate the pages. One of the partitioning techniques we discuss is especially practical for high dimensional data, and our disk and page allocation techniques are optimal with respect to number of I/O accesses and seek times. We provide experimental results that support our claims on two real high dimensional datasets.