Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations
Applied Numerical Mathematics
Introducing the open trace format (OTF)
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Trace-based performance analysis for the petascale simulation code FLASH
International Journal of High Performance Computing Applications
An approach to creating performance visualizations in a parallel profile analysis tool
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Enabling event tracing at leadership-class scale through I/O forwarding middleware
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Optimizing I/O forwarding techniques for extreme-scale event tracing
Cluster Computing
Hi-index | 0.00 |
Contemporary high-end Terascale and Petascale systems are composed of hundreds of thousands of commodity multi-core processors interconnected with high-speed custom networks. Performance characteristics of applications executing on these systems are a function of system hardware and software as well as workload parameters. Therefore, it has become increasingly challenging to measure, analyze and project performance using a single tool on these systems. In order to address these issues, we propose a methodology for performance measurement and analysis that is aware of applications and the underlying system hierarchies. On the application level, we measure cost distribution and runtime dependent values for different components of the underlying programming model. On the system front, we measure and analyze information gathered for unique system features, particularly shared components in the multi-core processors. We demonstrate our approach using a Petascale combustion application called S3D on two high-end Teraflops systems, Cray XT4 and IBM Blue Gene/P, using a combination of hardware performance monitoring, profiling and tracing tools.