Automatic Run-time Parallelization and Transformation of I/O

Authors:
Thorvald Natvig;Anne C. Elster;Jan Christian Meyer
Affiliations:
-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 8
Cited 0

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
MultiView and Millipage — fine-grain sharing in page-based DSMs

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Using MPI-2: Advanced Features of the Message Passing Interface

Using MPI-2: Advanced Features of the Message Passing Interface
MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
Using MPI file caching to improve parallel write performance for large-scale scientific applications

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)

Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Automatic and transparent optimizations of an application's MPI communication

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the size of computational clusters grows, one can expect that I/O will consume an increasing portion of wall-clock time as the problem and node sizes are scaled up, unless parallel I/O is introduced. Unfortunately, using parallel I/O is non-trivial, so few applications developed by individual researchers enjoy its benefits. In this paper, we describe our novel method for analyzing I/O and communication operations at run-time. When nodes perform I/O or communication operations, our technique protects the memory associated with the requests from the application. Subsequent operations are analyzed for overlap between communication and I/O operations. When found, the I/O operation is automatically transformed, by our injected library, from an individual operation to a collective and shared MPI I/O operation. This allows users to benefit from parallel file systems without redesigning or recompiling their applications, and we demonstrate speedup for common usage patterns.