Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

Authors:
Christina M. Patrick;SeungWoo Son;Mahmut Kandemir
Affiliations:
Pennsylvania State University;Pennsylvania State University;Pennsylvania State University
Venue:
ACM SIGOPS Operating Systems Review
Year:
2008

Citing 19
Cited 3

Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Optimizing noncontiguous accesses in MPI – IO

Parallel Computing
Evaluation of collective I/O implementations on parallel architectures

Journal of Parallel and Distributed Computing
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

The Journal of Supercomputing
Compiler-Directed I/O Optimization

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
MTIO - A Multi-Threaded Parallel I/O System

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Improving Collective I/O Performance Using Threads

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Disk-directed I/O for an out-of-core computation

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
The performance impact of I/O optimizations and disk improvements

IBM Journal of Research and Development
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Enhancing the performance of MPI-IO applications by overlapping I/O, computation and communication

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A case for non-blocking collective operations

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Issues in developing a thread-safe MPI implementation

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

DataStager: scalable data staging services for petascale applications

Proceedings of the 18th ACM international symposium on High performance distributed computing
DataStager: scalable data staging services for petascale applications

Cluster Computing
Design and modeling of a non-blocking checkpointing system

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many scientific applications use parallel I/O to meet the low latency and high bandwidth I/O requirement. Among many available parallel I/O operations, collective I/O is one of the most popular methods when the storage layouts and access patterns of data do not match. The implementation of collective I/O typically involves disk I/O operations followed by interprocessor communications. Also, in many I/O-intensive applications, parallel I/O operations are usually followed by parallel computations. This paper presents a comparative study of different overlap strategies in parallel applications. We have experimented with four different overlap strategies 1) Overlapping I/O and communication; 2) Overlapping I/O and computation; 3) Overlapping computation and communication; and 4) Overlapping I/O, communication, and computation. All experiments have been conducted on a Linux Cluster and the performance results obtained are very encouraging. On an average, we have enhanced the performance of a generic collective read call by 38%, the MxM benchmark by 26%, and the FFT benchmark by 34%.