Characterizing output bottlenecks in a supercomputer

Authors:
Bing Xie;Jeffrey Chase;David Dillow;Oleg Drokin;Scott Klasky;Sarp Oral;Norbert Podhorszki
Affiliations:
Duke University, Durham, NC;Duke University, Durham, NC;Oak Ridge National Laboratory, Oak Ridge, TN;Intel Corporation, Knoxville, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2012

Citing 17
Cited 1

Architectural requirements of parallel scientific applications with explicit communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
VAXcluster: a closely-coupled distributed system

ACM Transactions on Computer Systems (TOCS)
Input/output characteristics of scalable parallel applications

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
File-Access Characteristics of Parallel Scientific Workloads

IEEE Transactions on Parallel and Distributed Systems
A study of I/O behavior of perfect benchmarks on a multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A large-scale study of failures in high-performance computing systems

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adaptable, metadata rich IO methods for portable high performance IO

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
I/O performance challenges at leadership scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Efficient object storage journaling in a distributed parallel file system

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Scalable Earthquake Simulation on Petascale Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Understanding and Improving Computational Science Storage Access through Continuous Characterization

ACM Transactions on Storage (TOS)
EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems

PCCC '11 Proceedings of the 30th IEEE International Performance Computing and Communications Conference

Characterization and modeling of PIDX parallel I/O for performance optimization

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.