Investigation of leading HPC I/O performance using a scientific-application derived benchmark

Authors:
Julian Borrill;Leonid Oliker;John Shalf;Hongzhang Shan
Affiliations:
Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 7
Cited 13

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Using MPI-2: Advanced Features of the Message Passing Interface

Using MPI-2: Advanced Features of the Message Passing Interface
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Beyond the Storage Area Network: Data Intensive Computing in a Distributed Environment

MSST '05 Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies
Solaris Internals (2nd Edition)

Solaris Internals (2nd Edition)
Integrated Performance Monitoring of a Cosmology Application on Leading HEC Platforms

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Performance characteristics of a cosmology package on leading HPC architectures

HiPC'04 Proceedings of the 11th international conference on High Performance Computing

Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
HPC global file system performance analysis using a scientific-application derived benchmark

Parallel Computing
DataStager: scalable data staging services for petascale applications

Proceedings of the 18th ACM international symposium on High performance distributed computing
I/O performance challenges at leadership scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
DataStager: scalable data staging services for petascale applications

Cluster Computing
An HDF5 MPI virtual file driver for parallel in-situ post-processing

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Efficient data IO for a Parallel Global Cloud Resolving Model

Environmental Modelling & Software
I/O threads to reduce checkpoint blocking for an electromagnetics solver on Blue Gene/P and Cray XK6

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Design and modeling of a non-blocking checkpointing system

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MTC envelope: defining the capability of large scale computers in the context of parallel scripting applications

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Toward millions of file system IOPS on low-cost, commodity hardware

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A synthetic task model for HPC-grade optical network performance evaluation

IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the exponential growth of high-fidelity sensor and simulated data, the scientific community is increasingly reliant on ultrascale HPC resources to handle their data analysis requirements. However, to utilize such extreme computing power effectively, the I/O components must be designed in a balanced fashion, as any architectural bottleneck will quickly render the platform intolerably inefficient. To understand I/O performance of data-intensive applications in realistic computational settings, we develop a lightweight, portable benchmark called MADbench2, which is derived directly from a large-scale Cosmic Microwave Background (CMB) data analysis package. Our study represents one of the most comprehensive I/O analyses of modern parallel filesystems, examining a broad range of system architectures and configurations, including Lustre on the Cray XT3 and Intel Itanium2 cluster; GPFS on IBM Power5 and AMD Opteron platforms; two BlueGene/L installations utilizing GPFS and PVFS2 filesystems; and CXFS on the SGI Altix3700. We present extensive synchronous I/O performance data comparing a number of key parameters including concurrency, POSIX- versus MPI-IO, and unique- versus shared-file accesses, using both the default environment as well as highly-tuned I/O parameters. Finally, we explore the potential of asynchronous I/O and quantify the volume of computation required to hide a given volume of I/O. Overall our study quantifies the vast differences in performance and functionality of parallel filesystems across state-of-the-art platforms, while providing system designers and computational scientists a lightweight tool for conducting further analyses.