Visualizing parallel computer system performance
Instrumentation for future parallel computing systems
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Application of sampling methodologies to network traffic characterization
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
Portable profiling and tracing for parallel, scientific applications using C++
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Very high resolution simulation of compressible turbulence on the IBM-SP system
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
From trace generation to visualization: a performance framework for distributed parallel systems
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer
ICS '01 Proceedings of the 15th international conference on Supercomputing
Software Visualization
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Rivet: a flexible environment for computer systems visualization
ACM SIGGRAPH Computer Graphics
Parallel Performance Visualization: From Practice to Theory
IEEE Parallel & Distributed Technology: Systems & Technology
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
DiP: A Parallel Program Development Environment
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
IMPuLSE: integrated monitoring and profiling for large-scale environments
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Automatic run-time extraction of communication graphs from multithreaded applications
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Compensation of Measurement Overhead in Parallel Performance Profiling
International Journal of High Performance Computing Applications
Scalability analysis of SPMD codes using expectations
Proceedings of the 21st annual international conference on Supercomputing
Evaluating similarity-based trace reduction techniques for scalable performance analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Performance profiling overhead compensation for MPI programs
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Models for on-the-fly compensation of measurement overhead in parallel performance profiling
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Trace profiling: Scalable event tracing on high-end parallel systems
Parallel Computing
Hi-index | 0.00 |
Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named PHOTON, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. PHOTON consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, PHOTON's design reveals that frugal modifications to the MPI runtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for long-running applications.