Exploiting sequential access when declustering data over disks and MEMS-based storage

Authors:
Hailing Yu;Divyakant Agrawal;Amr El Abbadi
Affiliations:
Oracle Corporation, Redwood Shores, USA 94065;Computer Science Department, University of California at Santa Barbara, Santa Barbara, USA 93106;Computer Science Department, University of California at Santa Barbara, Santa Barbara, USA 93106
Venue:
Distributed and Parallel Databases
Year:
2006

Citing 15
Cited 0

Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Linear clustering of objects with multiple attributes

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
An analysis of schedules for performing multi-page requests

Information Systems
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modeling and performance of MEMS-based storage devices

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Cyclic Allocation of Two-Dimensional Data

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Optimal Allocation of Two-Dimensional Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Optimal Parallel I/O for Range Queries through Replication

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Declustering Using Golden Ratio Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries

Journal of the ACM (JACM)
Operating system management of MEMS-based storage devices

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Tabular placement of relational data on MEMS-based storage devices

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The "Millipede": more than one thousand tips for future AFM data storage

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the large difference between seek time and transfer time in current disk technology, it is advantageous to perform large I/O using a single sequential access rather than multiple small random I/O accesses. However, prior optimal cost and data placement approaches for processing range queries over two-dimensional datasets do not consider this property. In particular, these techniques do not consider the issue of sequential data placement when multiple I/O blocks need to be retrieved from a single device. In this paper, we reevaluate the optimal cost of range queries by declustering two-dimensional datasets over multiple devices, and prove that, in general, it is impossible to achieve the new optimal cost. This is because disks cannot facilitate two-dimensional sequential access which is required by the new optimal cost. Then we revisit the existing data allocation schemes under the new optimal cost, and show that none of them can achieve the new optimal cost. Fortunately, MEMS-based storage is being developed to reduce I/O cost. We first show that the two-dimensional sequential access requirement can not be satisfied by simply modeling MEMS-based storage as conventional disks. Then we propose a new placement scheme that exploits the physical properties of MEMS-based storage to solve this problem. Our theoretical analysis and experimental results show that the new scheme achieves almost optimal I/O costs.