Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving online performance diagnosis by the use of historical performance data
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Capturing and automating performance diagnosis: the Poirot approach
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Preliminary Evaluation of FINESSE , a Feedback-Guided Performance Enhancement System
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Callgraph-Based Search Strategy for Automated Performance Diagnosis (Distinguished Paper)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Rule-based Approach for Automatic Bottleneck Detection in Programs on Shared
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Dynamic instrumentation for Java using a virtual JVM
Performance analysis and grid computing
Automatic performance debugging of SPMD-style parallel programs
Journal of Parallel and Distributed Computing
A loop-aware search strategy for automated performance analysis
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Hi-index | 0.00 |
We present Deep Start, a new algorithm for automated performance diagnosis that uses stack sampling to augment our search-based automated performance diagnosis strategy. Our hybrid approach locates performance problems more quickly and finds problems hidden from a more straightforward search strategy. Deep Start uses stack samples collected as a by-product of normal search instrumentation to find deep starters, functions that are likely to be application bottlenecks. Deep starters are examined early during a search to improve the likelihood of finding performance problems quickly. We implemented the Deep Start algorithm in the Performance Consultant, Paradyn's automated bottleneck detection component. Deep Start found half of our test applications' known bottlenecks 32% to 59% faster than the Performance Consultant's current call graphbased search strategy, and finished finding bottlenecks 10% to 61% faster. In addition to improving search time, Deep Start often found more bottlenecks than the call graph search strategy.