The impact of spatial layout of jobs on parallel I/O performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Sourcebook of parallel computing
Discretionary Caching for I/O on Clusters
Cluster Computing
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scaling parallel I/O performance through I/O delegate and caching system
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A cost-intelligent application-specific data layout scheme for parallel file systems
Proceedings of the 20th international symposium on High performance distributed computing
Self-adaptive hints for collective i/o
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
A highly reliable and parallelizable data distribution scheme for data grids
Future Generation Computer Systems
Improving collective I/O performance by pipelining request aggregation and file access
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
"Parallel I/O" is the support of a single parallel application run on many nodes; application data is distributed among the nodes, and is read or written to a single logical file, itself spread across nodes and disks. Parallel I/O is a mapping problem from the data layout in node memory to the file layout on disks. Since the mapping can be quite complicated and involve significant data movement, optimizing the mapping is critical for performance. We discuss our general model of the problem, describe four Collective Buffering algorithms we designed, and report experiments testing their performance on an Intel Paragon and an IBM SP2 both housed at NASA Ames Research Center. Our experiments show improvements of up to two order of magnitude over standard techniques and the potential to deliver peak performance with minimal hardware support.