Collective Buffering: Improving Parallel I/O Performance

Authors:
Bill Nitzberg;Virginia Lo
Affiliations:
-;-
Venue:
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Year:
1997

Citing 0
Cited 10

The impact of spatial layout of jobs on parallel I/O performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
References

Sourcebook of parallel computing
Discretionary Caching for I/O on Clusters

Cluster Computing
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scaling parallel I/O performance through I/O delegate and caching system

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A cost-intelligent application-specific data layout scheme for parallel file systems

Proceedings of the 20th international symposium on High performance distributed computing
Self-adaptive hints for collective i/o

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
A highly reliable and parallelizable data distribution scheme for data grids

Future Generation Computer Systems
Improving collective I/O performance by pipelining request aggregation and file access

Proceedings of the 20th European MPI Users' Group Meeting
Cost-intelligent application-specific data layout optimization for parallel file systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

"Parallel I/O" is the support of a single parallel application run on many nodes; application data is distributed among the nodes, and is read or written to a single logical file, itself spread across nodes and disks. Parallel I/O is a mapping problem from the data layout in node memory to the file layout on disks. Since the mapping can be quite complicated and involve significant data movement, optimizing the mapping is critical for performance. We discuss our general model of the problem, describe four Collective Buffering algorithms we designed, and report experiments testing their performance on an Intel Paragon and an IBM SP2 both housed at NASA Ames Research Center. Our experiments show improvements of up to two order of magnitude over standard techniques and the potential to deliver peak performance with minimal hardware support.