A new approach to debugging optimized code
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
ICSE '92 Proceedings of the 14th international conference on Software engineering
An integrated compilation and performance analysis environment for data parallel programs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Nesting of reducible and irreducible loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Visualizing the performance of higher-order programs
Proceedings of the 1998 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Low-overhead call path profiling of unmodified, optimized code
Proceedings of the 19th annual international conference on Supercomputing
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Accurate, efficient, and adaptive calling context profiling
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Identifying potential parallelism via loop-centric profiling
Proceedings of the 4th international conference on Computing frontiers
Producing wrong data without doing anything obviously wrong!
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Learning to analyze binary computer code
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Finding low-utility data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hardware performance monitoring for the rest of us: a position and survey
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
DeadSpy: a tool to pinpoint program inefficiencies
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A new approach for performance analysis of openMP programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
ACIC: automatic cloud I/O configurator for HPC applications
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Effective sampling-driven performance tools for GPU-accelerated supercomputers
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic and Adaptive Calling Context Encoding
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Modern programs frequently employ sophisticated modular designs. As a result, performance problems cannot be identified from costs attributed to routines in isolation; understanding code performance requires information about a routine's calling context. Existing performance tools fall short in this respect. Prior strategies for attributing context-sensitive performance at the source level either compromise measurement accuracy, remain too close to the binary, or require custom compilers. To understand the performance of fully optimized modular code, we developed two novel binary analysis techniques: 1) on-the-fly analysis of optimized machine code to enable minimally intrusive and accurate attribution of costs to dynamic calling contexts; and 2) post-mortem analysis of optimized machine code and its debugging sections to recover its program structure and reconstruct a mapping back to its source code. By combining the recovered static program structure with dynamic calling context information, we can accurately attribute performance metrics to calling contexts, procedures, loops, and inlined instances of procedures. We demonstrate that the fusion of this information provides unique insight into the performance of complex modular codes. This work is implemented in the HPCToolkit performance tools (http://hpctoolkit.org).