Adaptive load sharing in homogeneous distributed systems
IEEE Transactions on Software Engineering
A specialized data management system for parallel execution of particle physics codes
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The SEQUOIA 2000 storage benchmark
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel computing in distributed workstation environments
Parallel Computing
A model and compilation strategy for out-of-core data parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A query language for multidimensional arrays: design, implementation, and optimization techniques
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Scalable message passing in Panda
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Optimizing collective I/O performance on parallel computers: a multisystem study
ICS '97 Proceedings of the 11th international conference on Supercomputing
Efficient Organization of Large Multidimensional Arrays
Proceedings of the Tenth International Conference on Data Engineering
VIP-FS: a VIrtual, Parallel File System for high performance parallel and distributed computing
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
High Performance Access to Radio Astronomy Data: A Case Study
Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management
Physical Schemas for Large Multidimensional Arrays in Scientific Computing Applications
Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management
Persistent Array Access Using Server-Directed I/O
SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
Disk-directed I/O for an out-of-core computation
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Hi-index | 0.00 |
Panda is a high-performance library for accessing large multidimensional array data on secondary storage of parallel platforms and networks of workstations. When using Panda as the I/O component of a scientific application, H3expresso, on the IBM SP2 at Cornell Theory Center, we found that some nodes are more powerful with respect to I/O than others, requiring the introduction of load balancing techniques to maintain high performance. We expect that heterogeneity will also be a big issue for DBMSs or parallel I/O libraries designed for scientific applications running on networks of workstations, and the methods of allocating data to servers in these environments will need to be upgraded to take heterogeneity into account, while still allowing users to exert control over data layout.We propose such an approach to load balancing, under which we respect the user's choice of high-level disk layout, but introduce automatic subchunking. The use of subchunks allows us to divide the very large chunks typically specified by the user's disk layout into more manageable-size units that can be allocated to I/O nodes in a manner that fairly distributes the load. We also present two techniques for allocating subchunks to nodes, static and dynamic, and evaluate their performance on the SP2.