Parallel Input/Output with Heterogeneous Disks

Authors:
Szu-Wen Kuo;Marianne Winslett;Ying Chen;Yong Cho;Mahesh Subramaniam;Kent E. Seamons
Affiliations:
-;-;-;-;-;-
Venue:
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Year:
1997

Citing 15
Cited 0

Adaptive load sharing in homogeneous distributed systems

IEEE Transactions on Software Engineering
A specialized data management system for parallel execution of particle physics codes

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel computing in distributed workstation environments

Parallel Computing
A model and compilation strategy for out-of-core data parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A query language for multidimensional arrays: design, implementation, and optimization techniques

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Scalable message passing in Panda

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Optimizing collective I/O performance on parallel computers: a multisystem study

ICS '97 Proceedings of the 11th international conference on Supercomputing
Efficient Organization of Large Multidimensional Arrays

Proceedings of the Tenth International Conference on Data Engineering
VIP-FS: a VIrtual, Parallel File System for high performance parallel and distributed computing

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Client-Server Paradise

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
High Performance Access to Radio Astronomy Data: A Case Study

Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management
Physical Schemas for Large Multidimensional Arrays in Scientific Computing Applications

Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management
Persistent Array Access Using Server-Directed I/O

SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
Disk-directed I/O for an out-of-core computation

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Panda is a high-performance library for accessing large multidimensional array data on secondary storage of parallel platforms and networks of workstations. When using Panda as the I/O component of a scientific application, H3expresso, on the IBM SP2 at Cornell Theory Center, we found that some nodes are more powerful with respect to I/O than others, requiring the introduction of load balancing techniques to maintain high performance. We expect that heterogeneity will also be a big issue for DBMSs or parallel I/O libraries designed for scientific applications running on networks of workstations, and the methods of allocating data to servers in these environments will need to be upgraded to take heterogeneity into account, while still allowing users to exert control over data layout.We propose such an approach to load balancing, under which we respect the user's choice of high-level disk layout, but introduce automatic subchunking. The use of subchunks allows us to divide the very large chunks typically specified by the user's disk layout into more manageable-size units that can be allocated to I/O nodes in a manner that fairly distributes the load. We also present two techniques for allocating subchunks to nodes, static and dynamic, and evaluate their performance on the SP2.