File-Access Characteristics of Parallel Scientific Workloads
IEEE Transactions on Parallel and Distributed Systems
Statistical scalability analysis of communication operations in distributed applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Workload Characterization of Input/Output Intensive Parallel Applications
Proceedings of the 9th International Conference on Computer Performance Evaluation: Modelling Techniques and Tools
A nine year study of file system and storage benchmarking
ACM Transactions on Storage (TOS)
Characterizing the I/O behavior of scientific applications on the Cray XT
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Towards an I/O tracing framework taxonomy
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Measurement and analysis of large-scale network file system workloads
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Towards realistic file-system benchmarks with CodeMRI
ACM SIGMETRICS Performance Evaluation Review
Capture, conversion, and analysis of an intense NFS workload
FAST '09 Proccedings of the 7th conference on File and storage technologies
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
I/O performance challenges at leadership scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable I/O tracing and analysis
Proceedings of the 4th Annual Workshop on Petascale Data Storage
GPFS: a shared-disk file system for large computing clusters
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Probabilistic Communication and I/O Tracing with Deterministic Replay at Scale
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Characterizing output bottlenecks in a supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
I/O acceleration with pattern detection
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Characterization of incremental data changes for efficient data protection
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Making problem diagnosiswork for large-scale, production storage systems
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Storage QoS provisioning for execution programming of data-intensive applications
Scientific Programming - Biological Knowledge Discovery and Data Mining
Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.01 |
Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support. In this study, we outline a methodology for scalable, continuous, systemwide I/O characterization that combines storage device instrumentation, static file system analysis, and a new mechanism for capturing detailed application-level behavior. This methodology allows us to identify both system-wide trends and application-specific I/O strategies. We demonstrate the effectiveness of our methodology by performing a multilevel, two-month study of Intrepid, a 557-teraflop IBM Blue Gene/P system. During that time, we captured application-level I/O characterizations from 6,481 unique jobs spanning 38 science and engineering projects. We used the results of our study to tune example applications, highlight trends that impact the design of future storage systems, and identify opportunities for improvement in I/O characterization methodology.