Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment

Authors:
Phillip M. Dickens;Jeremy Logan
Affiliations:
The University of Maine, Orono, ME, USA;The University of Maine, Orono, ME, USA
Venue:
Proceedings of the 18th ACM international symposium on High performance distributed computing
Year:
2009

Citing 7
Cited 2

On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Optimizing noncontiguous accesses in MPI – IO

Parallel Computing
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Noncontiguous I/O Accesses Through MPI-IO

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Exploiting Lustre File Joining for Effective Collective IO

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid

The impact of applications' I/O strategies on the performance of the Lustre parallel file system

International Journal of High Performance Systems Architecture
A toolkit for storage qos provisioning for data-intensive applications

Building a National Distributed e-Infrastructure - PL-Grid

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is widely known that MPI-IO performs poorly in a Lustre file system environment, although the reasons for such performance are currently not well understood. The research presented in this paper strongly supports our hypothesis that MPI-IO performs poorly in this environment because of the fundamental assumptions upon which most parallel I/O optimizations are based. In particular, it is almost universally believed that parallel I/O performance is optimized when aggregator processes perform large, contiguous I/O operations in parallel. Our research shows that this approach generally provides the worst performance in a Lustre environment, and that the best performance is often obtained when the aggregator processes perform a large number of small, non-contiguous I/O operations. In this paper, we first demonstrate and explain these non-intuitive results. We then present a user-level library, termed Y-lib, which redistributes data in a way that conforms much more closely with the Lustre storage architecture than does the data redistribution pattern employed by MPI-IO. We then provide experimental results showing that Y-lib can increase performance between 300% and 1000% depending on the number of aggregator processes and file size. Finally, we cause MPI-IO itself to use our data redistribution scheme, and show that doing so results in an increase in performance of a similar magnitude when compared to the current MPI-IO data redistribution algorithms.