Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems

  • Authors:
  • Xuechen Zhang;Song Jiang;Kei Davis

  • Affiliations:
  • ECE Department, Wayne State University, Detroit, MI 48202, USA;ECE Department, Wayne State University, Detroit, MI 48202, USA;Computer and Computational Sciences, Los Alamos National Laboratory, NM 87545, USA

  • Venue:
  • IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collective I/O is a widely used technique to improve I/O performance in parallel computing. It can be implemented as a client-based or as a server-based scheme. The client-based implementation is more widely adopted in the MPIIO software such as ROMIO because of its independence from the storage system configuration and its greater portability. However, existing implementations of client-side collective I/O do not consider the actual pattern of file striping over multiple I/O nodes in the storage system. This can cause a large number of requests for non-sequential data at I/O nodes, substantially degrading I/O performance. Investigating a surprisingly high I/O throughput achieved when there is an accidental match between a particular request pattern and the data striping pattern on the I/O nodes, we reveal the resonance phenomenon as the cause. Exploiting readily available information on data striping from the metadata server in popular file systems such as PVFS2 and Lustre, we design a new collective I/O implementation technique, named as resonant I/O, that makes resonance a common case. Resonant I/O rearranges requests from multiple MPI processes according to the presumed data layout on the disks of I/O nodes so that non-sequential access of disk data can be turned into sequential access, significantly improving I/O performance without compromising the independence of a client-based implementation. We have implemented our design in ROMIO. Our experimental results on a small- and medium-scale cluster show that the scheme can increase I/O throughput for some commonly used parallel I/O benchmarks such as mpi-io-test and ior-mpi-io over the existing implementation of ROMIO by up to 157%, with no scenario demonstrating significantly decreased performance.