Workload characterization for operator-based distributed stream processing applications
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Exposing tunable parameters in multi-threaded numerical code
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Perfctr-Xen: a framework for performance counter virtualization
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Towards production monitoring of application progress
Proceedings of the 4th International Workshop on Software Engineering for Computational Science and Engineering
Automatic performance debugging of SPMD-style parallel programs
Journal of Parallel and Distributed Computing
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
Auto-tuning full applications: A case study
International Journal of High Performance Computing Applications
Bridging performance analysis tools and analytic performance modeling for HPC
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Reducing the overhead of direct application instrumentation using prior static analysis
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
An evaluation of different modeling techniques for iterative compilation
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
ADP: automated diagnosis of performance pathologies using hardware events
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
DeadSpy: a tool to pinpoint program inefficiencies
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Automatic restructuring of GPU kernels for exploiting inter-thread data locality
CC'12 Proceedings of the 21st international conference on Compiler Construction
Cache Conscious Task Regrouping on Multicore Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Performance analysis techniques for task-based OpenMP applications
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Review: Energy-aware performance analysis methodologies for HPC architectures-An exploratory study
Journal of Network and Computer Applications
Characterizing and mitigating work time inflation in task parallel programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A multi-level monitoring framework for stream-based coordination programs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Libmonitor: A tool for first-party monitoring
Parallel Computing
On the efficacy of GPU-integrated MPI for scientific applications
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Elastic and scalable tracing and accurate replay of non-deterministic events
Proceedings of the 27th international ACM conference on International conference on supercomputing
A new approach for performance analysis of openMP programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
An early prototype of an autonomic performance environment for exascale
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Exascale workload characterization and architecture implications
Proceedings of the High Performance Computing Symposium
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Effective sampling-driven performance tools for GPU-accelerated supercomputers
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Portable, MPI-interoperable coarray fortran
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
A tool to analyze the performance of multithreaded programs on NUMA architectures
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Characterizing and mitigating work time inflation in task parallel programs
Scientific Programming - Selected Papers from Super Computing 2012
Automatic identification of application I/O signatures from noisy server-side traces
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Optimizing I/O forwarding techniques for extreme-scale event tracing
Cluster Computing
Hi-index | 0.00 |
HPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs, exploring performance information and source code using a new user interface, and displaying hierarchical space–time diagrams based on traces of asynchronous call path samples. This paper provides an overview of HPCTOOLKIT and illustrates its utility for performance analysis of parallel applications. Copyright © 2009 John Wiley & Sons, Ltd.