Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems

Authors:
Xuechen Zhang;Song Jiang;Kei Davis
Affiliations:
ECE Department, Wayne State University, Detroit, MI 48202, USA;ECE Department, Wayne State University, Detroit, MI 48202, USA;Computer and Computational Sciences, Los Alamos National Laboratory, NM 87545, USA
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 11

InterferenceRemoval: removing interference of disk access for MPI programs through data replication

Proceedings of the 24th ACM International Conference on Supercomputing
Improve throughput of storage cluster interconnected with a TCP/IP network using intelligent server grouping

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A scheduling framework that makes any disk schedulers non-work-conserving solely based on request characteristics

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
The impact of applications' I/O strategies on the performance of the Lustre parallel file system

International Journal of High Performance Systems Architecture
A cost-intelligent application-specific data layout scheme for parallel file systems

Proceedings of the 20th international symposium on High performance distributed computing
Towards scalable I/O architecture for exascale systems

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Orthrus: a framework for implementing high-performance collective I/O in the multicore clusters

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Memory-conscious collective I/O for extreme scale HPC systems

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Data deduplication in a hybrid architecture for improving write performance

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Improving collective I/O performance by pipelining request aggregation and file access

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collective I/O is a widely used technique to improve I/O performance in parallel computing. It can be implemented as a client-based or as a server-based scheme. The client-based implementation is more widely adopted in the MPIIO software such as ROMIO because of its independence from the storage system configuration and its greater portability. However, existing implementations of client-side collective I/O do not consider the actual pattern of file striping over multiple I/O nodes in the storage system. This can cause a large number of requests for non-sequential data at I/O nodes, substantially degrading I/O performance. Investigating a surprisingly high I/O throughput achieved when there is an accidental match between a particular request pattern and the data striping pattern on the I/O nodes, we reveal the resonance phenomenon as the cause. Exploiting readily available information on data striping from the metadata server in popular file systems such as PVFS2 and Lustre, we design a new collective I/O implementation technique, named as resonant I/O, that makes resonance a common case. Resonant I/O rearranges requests from multiple MPI processes according to the presumed data layout on the disks of I/O nodes so that non-sequential access of disk data can be turned into sequential access, significantly improving I/O performance without compromising the independence of a client-based implementation. We have implemented our design in ROMIO. Our experimental results on a small- and medium-scale cluster show that the scheme can increase I/O throughput for some commonly used parallel I/O benchmarks such as mpi-io-test and ior-mpi-io over the existing implementation of ROMIO by up to 157%, with no scenario demonstrating significantly decreased performance.