Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving data-flow analysis with path profiles
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Control Speculation in Multithreaded Processors through Dynamic Loop Detection
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Path-based compilation
Extending Path Profiling across Loop Backedges and Procedure Boundaries
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Programmable Hardware Path Profiler
Proceedings of the international symposium on Code generation and optimization
Analyis of Path Profiling Information Generated with Performance Monitoring Hardware
INTERACT '05 Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures
Two-Path Limited Speculation Method for Static/Dynamic Optimization in Multithreaded Systems
PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Runtime predictability of loops
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Hi-index | 0.00 |
Dynamic parallelization and optimization of a loop is a crucial issue for enhancing the performance of sequential programs as loops account for a large fraction of execution time. Loop level parallelism can also be extracted efficiently due to its regular structure. Based on the observation that only a limited number of paths are executed frequently in hot loops, we propose a hardware hot loop path detector to specify such hot loops and their hot paths accurately so that the dynamic optimizer may utilize the detected information effectively. The detector consists of a stack structured bit-tracing unit that identifies loop paths at a subroutine level, a hot loop detector that detects hot loops by utilizing loop path information and a hot path accumulator of loop paths. Experiments using SPEC CINT2000 show that loop paths occupy a small fraction (14.46%) of Ball-Larus paths but are detected frequently (64.45% of Ball-Larus paths). A combined small scale hot loop detector and hot path accumulator (32 entries each) attain a detection accuracy of 97.10% for the hottest loop path and 93.83% for the top 2 hottest loop paths and their order within hot loops.