Inaccuracies in program profilers
Software—Practice & Experience
Practical experience of the limitations of Gprof
Software—Practice & Experience
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
Visualizing the performance of higher-order programs
Proceedings of the 1998 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
A portable sampling-based profiler for Java virtual machines
Proceedings of the ACM 2000 conference on Java Grande
An open graph visualization system and its applications to software engineering
Software—Practice & Experience - Special issue on discrete algorithm engineering
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
IEEE Transactions on Software Engineering
Graph Layout through the VCG Tool
GD '94 Proceedings of the DIMACS International Workshop on Graph Drawing
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Fast, accurate call graph profiling
Software—Practice & Experience
Accurate, efficient, and adaptive calling context profiling
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Whodunit: transactional profiling for multi-tier applications
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Scalability analysis of SPMD codes using expectations
Proceedings of the 21st annual international conference on Supercomputing
Performance tuning with instruction-level cost derived from call-stack sampling
ACM SIGPLAN Notices
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
DARC: dynamic analysis of root causes of latency distributions
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Controlled dynamic performance analysis
WOSP '08 Proceedings of the 7th international workshop on Software and performance
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast memory snapshot for concurrent programmingwithout synchronization
Proceedings of the 23rd international conference on Supercomputing
Binary analysis for measurement and attribution of program performance
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Building Approximate Calling Context from Partial Call Traces
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Self-adapting service level in Java enterprise edition
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Analyzing lock contention in multithreaded applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Self-adaptation of service level in distributed systems
Software—Practice & Experience
Taming hardware event samples for FDO compilation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Breadcrumbs: efficient context sensitivity for dynamic bug detection analyses
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Self-adapting service level in Java enterprise edition
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Netlag: a performance evaluation tool for massively multi-user networked applications
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Mining hot calling contexts in small space
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Elastic and scalable tracing and accurate replay of non-deterministic events
Proceedings of the 27th international ACM conference on International conference on supercomputing
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
DIME: time-aware dynamic binary instrumentation using rate-based resource allocation
Proceedings of the Eleventh ACM International Conference on Embedded Software
Hi-index | 0.00 |
Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven strategy for collecting frequency counts for call graph edges without instrumenting every procedure's code to count them. The data structures and algorithms used are efficient enough to construct the complete calling context tree exposed during sampling. The profiler leverages information recorded by compilers for debugging or exception handling to record call path profiles even for highly-optimized code. We describe an implementation for the Tru64/Alpha platform. Experiments profiling the SPEC CPU2000 benchmark suite demonstrate the low (2%-7%) overhead of this profiler. A comparison with instrumentation-based profilers, such as gprof, shows that for call-intensive programs, our sampling-based strategy for call path profiling has over an order of magnitude lower overhead.