HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Authors:
L. Adhianto;S. Banerjee;M. Fagan;M. Krentel;G. Marin;J. Mellor-Crummey;N. R. Tallent
Affiliations:
Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, U.S.A.;Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, U.S.A.;Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, U.S.A.;Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, U.S.A.;Oak Ridge National Laboratory, One Bethel Valley Road, P.O. Box 2008 MS6173, Oak Ridge, TN 37831-6173, U.S.A.;Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, U.S.A.;Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, U.S.A.
Venue:
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Year:
2010

Citing 0
Cited 36

Workload characterization for operator-based distributed stream processing applications

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Exposing tunable parameters in multi-threaded numerical code

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Perfctr-Xen: a framework for performance counter virtualization

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Towards production monitoring of application progress

Proceedings of the 4th International Workshop on Software Engineering for Computational Science and Engineering
Automatic performance debugging of SPMD-style parallel programs

Journal of Parallel and Distributed Computing
Scalable fine-grained call path tracing

Proceedings of the international conference on Supercomputing
Understanding stencil code performance on multicore architectures

Proceedings of the 8th ACM International Conference on Computing Frontiers
Auto-tuning full applications: A case study

International Journal of High Performance Computing Applications
Bridging performance analysis tools and analytic performance modeling for HPC

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Reducing the overhead of direct application instrumentation using prior static analysis

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
An evaluation of different modeling techniques for iterative compilation

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Pinpointing data locality problems using data-centric analysis

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
ADP: automated diagnosis of performance pathologies using hardware events

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
DeadSpy: a tool to pinpoint program inefficiencies

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Automatic restructuring of GPU kernels for exploiting inter-thread data locality

CC'12 Proceedings of the 21st international conference on Compiler Construction
Cache Conscious Task Regrouping on Multicore Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Performance analysis techniques for task-based OpenMP applications

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Review: Energy-aware performance analysis methodologies for HPC architectures-An exploratory study

Journal of Network and Computer Applications
Characterizing and mitigating work time inflation in task parallel programs

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A multi-level monitoring framework for stream-based coordination programs

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Libmonitor: A tool for first-party monitoring

Parallel Computing
On the efficacy of GPU-integrated MPI for scientific applications

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Elastic and scalable tracing and accurate replay of non-deterministic events

Proceedings of the 27th international ACM conference on International conference on supercomputing
A new approach for performance analysis of openMP programs

Proceedings of the 27th international ACM conference on International conference on supercomputing
An early prototype of an autonomic performance environment for exascale

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Exascale workload characterization and architecture implications

Proceedings of the High Performance Computing Symposium
Detection of false sharing using machine learning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A data-centric profiler for parallel programs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Effective sampling-driven performance tools for GPU-accelerated supercomputers

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Call Paths for Pin Tools

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Portable, MPI-interoperable coarray fortran

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
A tool to analyze the performance of multithreaded programs on NUMA architectures

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Characterizing and mitigating work time inflation in task parallel programs

Scientific Programming - Selected Papers from Super Computing 2012
Automatic identification of application I/O signatures from noisy server-side traces

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Optimizing I/O forwarding techniques for extreme-scale event tracing

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

HPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs, exploring performance information and source code using a new user interface, and displaying hierarchical space–time diagrams based on traces of asynchronous call path samples. This paper provides an overview of HPCTOOLKIT and illustrates its utility for performance analysis of parallel applications. Copyright © 2009 John Wiley & Sons, Ltd.