Low-overhead call path profiling of unmodified, optimized code

Authors:
Nathan Froyd;John Mellor-Crummey;Rob Fowler
Affiliations:
Rice University, Houston, TX;Rice University, Houston, TX;Rice University, Houston, TX
Venue:
Proceedings of the 19th annual international conference on Supercomputing
Year:
2005

Citing 14
Cited 30

Inaccuracies in program profilers

Software—Practice & Experience
Practical experience of the limitations of Gprof

Software—Practice & Experience
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
Visualizing the performance of higher-order programs

Proceedings of the 1998 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
A portable sampling-based profiler for Java virtual machines

Proceedings of the ACM 2000 conference on Java Grande
An open graph visualization system and its applications to software engineering

Software—Practice & Experience - Special issue on discrete algorithm engineering
A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
Call Path Refinement Profiles

IEEE Transactions on Software Engineering
Graph Layout through the VCG Tool

GD '94 Proceedings of the DIMACS International Workshop on Graph Drawing
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Fast, accurate call graph profiling

Software—Practice & Experience

Accurate, efficient, and adaptive calling context profiling

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Whodunit: transactional profiling for multi-tier applications

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Scalability analysis of SPMD codes using expectations

Proceedings of the 21st annual international conference on Supercomputing
Performance tuning with instruction-level cost derived from call-stack sampling

ACM SIGPLAN Notices
Probabilistic calling context

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
DARC: dynamic analysis of root causes of latency distributions

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Controlled dynamic performance analysis

WOSP '08 Proceedings of the 7th international workshop on Software and performance
Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast memory snapshot for concurrent programmingwithout synchronization

Proceedings of the 23rd international conference on Supercomputing
Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Building Approximate Calling Context from Partial Call Traces

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Inferred call path profiling

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Self-adapting service level in Java enterprise edition

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Self-adaptation of service level in distributed systems

Software—Practice & Experience
Taming hardware event samples for FDO compilation

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Breadcrumbs: efficient context sensitivity for dynamic bug detection analyses

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Self-adapting service level in Java enterprise edition

Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Netlag: a performance evaluation tool for massively multi-user networked applications

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Argument controlled profiling

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Mining hot calling contexts in small space

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Scalable fine-grained call path tracing

Proceedings of the international conference on Supercomputing
Pinpointing data locality problems using data-centric analysis

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Input-sensitive profiling

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
k-Calling context profiling

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Elastic and scalable tracing and accurate replay of non-deterministic events

Proceedings of the 27th international ACM conference on International conference on supercomputing
A data-centric profiler for parallel programs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient context sensitivity for dynamic analyses via calling context uptrees and customized memory management

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
DIME: time-aware dynamic binary instrumentation using rate-based resource allocation

Proceedings of the Eleventh ACM International Conference on Embedded Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven strategy for collecting frequency counts for call graph edges without instrumenting every procedure's code to count them. The data structures and algorithms used are efficient enough to construct the complete calling context tree exposed during sampling. The profiler leverages information recorded by compilers for debugging or exception handling to record call path profiles even for highly-optimized code. We describe an implementation for the Tru64/Alpha platform. Experiments profiling the SPEC CPU2000 benchmark suite demonstrate the low (2%-7%) overhead of this profiler. A comparison with instrumentation-based profilers, such as gprof, shows that for call-intensive programs, our sampling-based strategy for call path profiling has over an order of magnitude lower overhead.