Data partitioning and load balancing in parallel disk systems

Authors:
Peter Scheuermann;Gerhard Weikum;Peter Zabback
Affiliations:
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA/ E-mail: peters@eecs.nwu.edu;Department of Computer Science, University of the Saarland, P.O. Box 151150, D-66041 Saarbrü/cken, Germany/ E-mail: weikum@cs.uni-sb.de;Tandem Computers Incorporated, 10100 North Tantau Avenue, Cupertino, CA 95014-2542, USA/ E-mail: zabback@loc251.tandem.com
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
1998

Citing 67
Cited 43

Synchronized Disk Interleaving

IEEE Transactions on Computers
Multi-disk management algorithms

SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Approximate Analysis of Fork/Join Synchronization in Parallel Queues

IEEE Transactions on Computers
Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A comparison of high-availability media recovery techniques

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Failure correction techniques for large disk arrays

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
On the performance of on-line algorithms for partition problems

Acta Cybernetica
The placement optimization program: a practical solution to the disk file assignment problem

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Parity striping of disc arrays: low-cost reliable storage with acceptable throughput

Proceedings of the sixteenth international conference on Very large databases
Performance analysis of disk arrays under failure

Proceedings of the sixteenth international conference on Very large databases
Hybrid-range partitioning strategy: a new declustering strategy for multiprocessor databases machines

Proceedings of the sixteenth international conference on Very large databases
Dynamic file allocation in disk arrays

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Asynchronous Disk Interleaving: Approximating Access Delays

IEEE Transactions on Computers
Disk Allocation Methods Using Error Correcting Codes

IEEE Transactions on Computers
Competitive algorithms for distributed data management (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
New algorithms for an ancient scheduling problem

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Parallel database systems: the future of high performance database systems

Communications of the ACM
Redundant disk arrays: reliable, parallel secondary storage

Redundant disk arrays: reliable, parallel secondary storage
Distributed algorithms for dynamic replication of data

PODS '92 Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The competitiveness of on-line assignments

SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Parity declustering for continuous operation in redundant disk arrays

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of gracefully degradable disk arrays

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
The design and evaluation of RAID 5 and parity striping disk array architectures

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
Floating parity and data disk arrays

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
Parity logging overcoming the small write problem in redundant disk arrays

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The architecture of a fault-tolerant cached RAID controller

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An analytic performance model of disk arrays

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Competitive distributed file allocation

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
LH: Linear Hashing for distributed files

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Data partitioning for multicomputer database systems: a cell-based approach

Information Systems
The performance of disk arrays in shared-memory database machines

Distributed and Parallel Databases - Special issue: Research topics in distributed and parallel databases
The I/O subsystem/spl minus/a candidate for improvement

Computer
An introduction to disk drive modeling

Computer
Disk arrays: high-performance, high-reliability storage subsystems

Computer
Parallelism in relational database management systems

IBM Systems Journal
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
A performance study of three high availability data replication strategies

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Distorted mapping techniques to achieve high performance in mirrored disk systems

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Performance of RAID5 disk arrays with read and write caching

Distributed and Parallel Databases - Special issue on disk arrays
Architectures and algorithms for on-line failure recovery in redundant disk arrays

Distributed and Parallel Databases - Special issue on disk arrays
The COMFORT automatic tuning project

Information Systems
Distributed file organization with scalable cost/performance

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Adaptive block rearrangement

ACM Transactions on Computer Systems (TOCS)
Optimization of load-balanced file allocation

Optimization of load-balanced file allocation
Striping in a RAID level 5 disk array

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The HP AutoRAID hierarchical storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A better algorithm for an ancient scheduling problem

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
Maximizing performance in a striped disk array

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Comparative Models of the File Assignment Problem

ACM Computing Surveys (CSUR)
Dynamic parity stripe reorganizations for RAID5 disk arrays

PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
A Case for NOW (Networks of Workstations)

IEEE Micro
Analytic Modeling and Comparisons of Striping Strategies for Replicated Disk Arrays

IEEE Transactions on Computers
MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines

IEEE Transactions on Parallel and Distributed Systems
Adaptive Load Balancing in Disk Arrays

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
A Multiuser Performance Analysis of Alternative Declustering Strategies

Proceedings of the Sixth International Conference on Data Engineering
Disk Striping

Proceedings of the Second International Conference on Data Engineering
Disk Shadowing

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Disk Mirroring with Alternating Deferred Updates

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Hot Block Clustering for Disk Arrays with Dynamic Striping

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic Data Distribution (D3) in a Shared-Nothing Multiprocessor Data Store

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Mariposa: a wide-area distributed database system

The VLDB Journal — The International Journal on Very Large Data Bases
Demand-based document dissemination to reduce traffic and balance load in distributed information systems

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Theory, Volume 1, Queueing Systems

Theory, Volume 1, Queueing Systems

File Assignment in Parallel I/O Systems with Minimal Variance of Service Time

IEEE Transactions on Computers
Towards self-tuning data placement in parallel database systems

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Cache investment: integrating query optimization and distributed data placement

ACM Transactions on Database Systems (TODS)
Database Design Principles for Placement of Delay-Sensitive Data on Disks

IEEE Transactions on Knowledge and Data Engineering
Bridging the Gap between Response Time and Energy-Efficiency in Broadcast Schedule Design

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Design and Development of a Stream Service in a Heterogenous Client Environment

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Query Scheduling in Parallel Data Warehouses

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
On Disk Allocation of Intermediate Query Results in Parallel Database Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions

The VLDB Journal — The International Journal on Very Large Data Bases
Hippodrome: Running Circles Around Storage Administration

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Quickly finding near-optimal storage designs

ACM Transactions on Computer Systems (TOCS)
Adaptive parallel I/O scheduling algorithm for multiprogrammed systems

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Modeling and improving security of a local disk system for write-intensive workloads

ACM Transactions on Storage (TOS)
StReD: A quality of security framework for storage resources in Data Grids

Future Generation Computer Systems
Design and analysis of a load balancing strategy in data grids

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Self-tuning database technology and information services: from wishful thinking to viable engineering

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Experience report: exploiting advanced database optimization features for Large-Scale SAP R/3 installations

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
General store placement for response time minimization in parallel disks

Journal of Parallel and Distributed Computing
POEMS: Peer-Based Overload Management

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
New Balanced Data Allocating and Online Migrating Algorithms in Database Cluster

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Online reorganization of databases

ACM Computing Surveys (CSUR)
FTL design exploration in reconfigurable high-performance SSD for server applications

Proceedings of the 23rd international conference on Supercomputing
Self-tuning management of update-intensive multidimensional data in clusters of workstations

The VLDB Journal — The International Journal on Very Large Data Bases
A file assignment strategy independent of workload characteristic assumptions

ACM Transactions on Storage (TOS)
Parallel OLAP with the Sidera server

Future Generation Computer Systems
Conserving energy in real-time storage systems with I/O burstiness

ACM Transactions on Embedded Computing Systems (TECS)
Adaptive parallel I/O scheduling algorithm for multiprogrammed systems

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Sidera: a cluster-based server for online analytical processing

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
SAIL: self-adaptive file reallocation on hybrid disk arrays

HiPC'08 Proceedings of the 15th international conference on High performance computing
Quality of security adaptation in parallel disk systems

Journal of Parallel and Distributed Computing
Hippodrome: running circles around storage administration

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Small cache, big effect: provable load balancing for randomly partitioned cluster services

Proceedings of the 2nd ACM Symposium on Cloud Computing
Dynamic object assignment in object-based storage devices

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Algorithms for the database layout problem

ICDT'05 Proceedings of the 10th international conference on Database Theory
Online virtual disk migration with performance guarantees in a shared storage environment

PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
An on-line reorganization framework for SAN file systems

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
An optimal candidate selection model for self-acting load balancing of parallel file system

International Journal of High Performance Computing and Networking
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
General-purpose optimization methods for parallelization of digital terrain analysis based on cellular automata

Computers & Geosciences
A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers

Journal of Parallel and Distributed Computing
An adaptive energy-conserving strategy for parallel disk systems

Future Generation Computer Systems
A New File-Specific Stripe Size Selection Method for Highly Concurrent Data Access

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper, we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications, and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.