Binary analysis for measurement and attribution of program performance

Authors:
Nathan R. Tallent;John M. Mellor-Crummey;Michael W. Fagan
Affiliations:
Rice University, Houston, TX, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA
Venue:
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Year:
2009

Citing 16
Cited 17

A new approach to debugging optimized code

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Call path profiling

ICSE '92 Proceedings of the 14th international conference on Software engineering
An integrated compilation and performance analysis environment for data parallel programs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Hot cold optimization of large Windows/NT applications

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Nesting of reducible and irreducible loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Visualizing the performance of higher-order programs

Proceedings of the 1998 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Low-overhead call path profiling of unmodified, optimized code

Proceedings of the 19th annual international conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Accurate, efficient, and adaptive calling context profiling

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Identifying potential parallelism via loop-centric profiling

Proceedings of the 4th international conference on Computing frontiers
Producing wrong data without doing anything obviously wrong!

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Learning to analyze binary computer code

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2

Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Finding low-utility data structures

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
The Cilkview scalability analyzer

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Argument controlled profiling

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Scalable fine-grained call path tracing

Proceedings of the international conference on Supercomputing
On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system

Proceedings of the 8th ACM International Conference on Computing Frontiers
Hardware performance monitoring for the rest of us: a position and survey

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Pinpointing data locality problems using data-centric analysis

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
DeadSpy: a tool to pinpoint program inefficiencies

Proceedings of the Tenth International Symposium on Code Generation and Optimization
A new approach for performance analysis of openMP programs

Proceedings of the 27th international ACM conference on International conference on supercomputing
ACIC: automatic cloud I/O configurator for HPC applications

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A data-centric profiler for parallel programs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Effective sampling-driven performance tools for GPU-accelerated supercomputers

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Call Paths for Pin Tools

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic and Adaptive Calling Context Encoding

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern programs frequently employ sophisticated modular designs. As a result, performance problems cannot be identified from costs attributed to routines in isolation; understanding code performance requires information about a routine's calling context. Existing performance tools fall short in this respect. Prior strategies for attributing context-sensitive performance at the source level either compromise measurement accuracy, remain too close to the binary, or require custom compilers. To understand the performance of fully optimized modular code, we developed two novel binary analysis techniques: 1) on-the-fly analysis of optimized machine code to enable minimally intrusive and accurate attribution of costs to dynamic calling contexts; and 2) post-mortem analysis of optimized machine code and its debugging sections to recover its program structure and reconstruct a mapping back to its source code. By combining the recovered static program structure with dynamic calling context information, we can accurately attribute performance metrics to calling contexts, procedures, loops, and inlined instances of procedures. We demonstrate that the fusion of this information provides unique insight into the performance of complex modular codes. This work is implemented in the HPCToolkit performance tools (http://hpctoolkit.org).