Exploiting Lustre File Joining for Effective Collective IO

Authors:
Weikuan Yu;Jeffrey Vetter;R. Shane Canon;Song Jiang
Affiliations:
Oak Ridge National Laboratory;Oak Ridge National Laboratory;Oak Ridge National Laboratory;Wayne State University
Venue:
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Year:
2007

Citing 0
Cited 19

Implementation and Evaluation of an MPI-IO Interface for GPFS in ROMIO

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Towards a High Performance Implementation of MPI-IO on the Lustre File System

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
A collective I/O implementation based on inspector---executor paradigm

The Journal of Supercomputing
Optimizing server placement for parallel I/O in switch-based clusters

Journal of Parallel and Distributed Computing
Data Locality Aware Strategy for Two-Phase Collective I/O

High Performance Computing for Computational Science - VECPAR 2008
Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment

Proceedings of the 18th ACM international symposium on High performance distributed computing
Performance Evaluation of Collective Write Algorithms in MPI I/O

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
PLFS: a checkpoint filesystem for parallel applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
I/O performance challenges at leadership scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Implementation and evaluation of active storage in modern parallel file systems

Parallel Computing
A Scalable Message Passing Interface Implementation of an Ad-Hoc Parallel I/o system

International Journal of High Performance Computing Applications
Improve throughput of storage cluster interconnected with a TCP/IP network using intelligent server grouping

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Virtual I/O caching: dynamic storage cache management for concurrent workloads

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Trace-based performance analysis for the petascale simulation code FLASH

International Journal of High Performance Computing Applications
A toolkit for storage qos provisioning for data-intensive applications

Building a National Distributed e-Infrastructure - PL-Grid
Transparent log-based data storage in MPI-IO applications

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Insights for exascale IO APIs from building a petascale IO API

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Structuring PLFS for extensibility

PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lustre is a parallel file system that presents high aggregated IO bandwidth by striping file extents across many storage devices. However, our experiments indicate excessively wide striping can cause performance degradation. Lustre supports an innovative file joining feature that joins files in place. To mitigate striping overhead and benefit collective IO, we propose two techniques: split writing and hierarchical striping. In split writing, a file is created as separate subfiles, each of which is striped to only a few storage devices. They are joined as a single file at the file close time. Hierarchical striping builds on top of split writing and orchestrates the span of subfiles in a hierarchical manner to avoid overlapping and achieve the appropriate coverage of storage devices. Together, these techniques can avoid the overhead associated with large stripe width, while still being able to combine bandwidth available from many storage devices. We have prototyped these techniques in the ROMIO implementation of MPI-IO. Experimental results indicate that split writing and hierarchical striping can significantly improve the performance of Lustre collective IO in terms of both data transfer and management operations. On a Lustre file system configured with 46 object storage targets, our implementation improves collective write performance of a 16-process job by as much as 220%.