On implementing MPI-IO portably and with high performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Optimizing noncontiguous accesses in MPI – IO
Parallel Computing
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
Noncontiguous I/O Accesses Through MPI-IO
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Exploiting Lustre File Joining for Effective Collective IO
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
The impact of applications' I/O strategies on the performance of the Lustre parallel file system
International Journal of High Performance Systems Architecture
A toolkit for storage qos provisioning for data-intensive applications
Building a National Distributed e-Infrastructure - PL-Grid
Hi-index | 0.00 |
It is widely known that MPI-IO performs poorly in a Lustre file system environment, although the reasons for such performance are currently not well understood. The research presented in this paper strongly supports our hypothesis that MPI-IO performs poorly in this environment because of the fundamental assumptions upon which most parallel I/O optimizations are based. In particular, it is almost universally believed that parallel I/O performance is optimized when aggregator processes perform large, contiguous I/O operations in parallel. Our research shows that this approach generally provides the worst performance in a Lustre environment, and that the best performance is often obtained when the aggregator processes perform a large number of small, non-contiguous I/O operations. In this paper, we first demonstrate and explain these non-intuitive results. We then present a user-level library, termed Y-lib, which redistributes data in a way that conforms much more closely with the Lustre storage architecture than does the data redistribution pattern employed by MPI-IO. We then provide experimental results showing that Y-lib can increase performance between 300% and 1000% depending on the number of aggregator processes and file size. Finally, we cause MPI-IO itself to use our data redistribution scheme, and show that doing so results in an increase in performance of a similar magnitude when compared to the current MPI-IO data redistribution algorithms.