Dynamic software testing of MPI applications with umpire
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Practical performance portability in the Parallel Ocean Program (POP): Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Multi-Layer Event Trace Analysis for Parallel I/O Performance Tuning
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Preserving time in large-scale communication traces
Proceedings of the 22nd annual international conference on Supercomputing
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
Introducing the open trace format (OTF)
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Scalable Communication Trace Compression
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Automated tracing of I/O stack
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
A cost-intelligent application-specific data layout scheme for parallel file systems
Proceedings of the 20th international symposium on High performance distributed computing
Understanding and Improving Computational Science Storage Access through Continuous Characterization
ACM Transactions on Storage (TOS)
ScalaTrace: tracing, analysis and modeling of HPC codes at scale
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Boosting Application-Specific Parallel I/O Optimization Using IOSIG
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Understanding i/o performance using i/o skeletal applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Metadata Traces and Workload Models for Evaluating Big Storage Systems
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
I/O acceleration with pattern detection
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Flash caching on the storage client
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
As supercomputer performance approached and then surpassed the petaflop level, I/O performance has become a major performance bottleneck for many scientific applications. Several tools exist to collect I/O traces to assist in the analysis of I/O performance problems. However, these tools either produce extremely large trace files that complicate performance analysis, or sacrifice accuracy to collect high-level statistical information. We propose a multi-level trace generator tool, ScalaIOTrace, that collects traces at several levels in the HPC I/O stack. ScalaIOTrace features aggressive trace compression that generates trace files of near constant size for regular I/O patterns and orders of magnitudes smaller for less regular ones. This enables the collection of I/O and communication traces of applications running on thousands of processors. Our contributions also include automated trace analysis to collect selected statistical information of I/O calls by parsing the compressed trace on-the-fly and time-accurate replay of communication events with MPI-IO calls. We evaluated our approach with the Parallel Ocean Program (POP) climate simulation and the FLASH parallel I/O benchmark. POP uses NetCDF as an I/O library while FLASH I/O uses the parallel HDF5 I/O library, which internally maps onto MPI-IO. We collected MPI-IO and low-level POSIX I/O traces to study application I/O behavior. Our results show constant size trace files of only 145KB irrespective of the number of nodes for FLASH I/O benchmark, which exhibits regular I/O and communication pattern. For POP, we observe up to two orders of magnitude reduction in trace file sizes compared to flat traces. Statistical information gathered reveals insight on the number of I/O and communication calls issued in the POP and FLASH I/O. Such concise traces are unprecedented for isolated I/O and combined I/O plus communication tracing.