Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Disk-directed I/O for MIMD multiprocessors
ACM Transactions on Computer Systems (TOCS)
Improving Collective I/O Performance Using Threads
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Collective Buffering: Improving Parallel I/O Performance
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Improving MPI-IO Output Performance with Active Buffering Plus Threads
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Parallel netCDF: A High-Performance Scientific I/O Interface
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Enhancing the performance of MPI-IO applications by overlapping I/O, computation and communication
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
View-Based Collective I/O for MPI-IO
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ParColl: Partitioned Collective I/O on the Cray XT
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
In this paper, we propose a multi-buffer pipelining approach to improve collective I/O performance by overlapping the dominant request aggregation phases with the I/O phase in the two-phase I/O implementation. Our pipelining method first divides the collective buffer into a group of small size buffers for an individual collective I/O call and then pipelines the asynchronous communication to exchange the I/O requests with the I/O requests sent to the file system. Our performance evaluation of a representative I/O benchmark and a production application shows 20% improvement in the I/O time, given theoretical upper bound of 50% when both phases completely overlap.