Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
On implementing MPI-IO portably and with high performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Optimizing noncontiguous accesses in MPI – IO
Parallel Computing
Evaluation of collective I/O implementations on parallel architectures
Journal of Parallel and Distributed Computing
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
Compiler-Directed I/O Optimization
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
MTIO - A Multi-Threaded Parallel I/O System
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Improving Collective I/O Performance Using Threads
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Disk-directed I/O for an out-of-core computation
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
The performance impact of I/O optimizations and disk improvements
IBM Journal of Research and Development
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Enhancing the performance of MPI-IO applications by overlapping I/O, computation and communication
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Issues in developing a thread-safe MPI implementation
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
DataStager: scalable data staging services for petascale applications
Proceedings of the 18th ACM international symposium on High performance distributed computing
DataStager: scalable data staging services for petascale applications
Cluster Computing
Design and modeling of a non-blocking checkpointing system
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Many scientific applications use parallel I/O to meet the low latency and high bandwidth I/O requirement. Among many available parallel I/O operations, collective I/O is one of the most popular methods when the storage layouts and access patterns of data do not match. The implementation of collective I/O typically involves disk I/O operations followed by interprocessor communications. Also, in many I/O-intensive applications, parallel I/O operations are usually followed by parallel computations. This paper presents a comparative study of different overlap strategies in parallel applications. We have experimented with four different overlap strategies 1) Overlapping I/O and communication; 2) Overlapping I/O and computation; 3) Overlapping computation and communication; and 4) Overlapping I/O, communication, and computation. All experiments have been conducted on a Linux Cluster and the performance results obtained are very encouraging. On an average, we have enhanced the performance of a generic collective read call by 38%, the MxM benchmark by 26%, and the FFT benchmark by 34%.