The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic path-based branch correlation
Proceedings of the 28th annual international symposium on Microarchitecture
Hardware-based profiling: an effective technique for profile-driven optimization
International Journal of Parallel Programming
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Analytical energy dissipation models for low-power caches
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Path-based next trace prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving data-flow analysis with path profiles
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Pentium 4 Performance-Monitoring Features
IEEE Micro
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Path Profile Guided Partial Redundancy Elimination Using Speculation
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Path-based compilation
An Efficient Online Path Profiling Framework for Java Just-In-Time Compilers
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Extending Path Profiling across Loop Backedges and Procedure Boundaries
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
Practical Path Profiling for Dynamic Optimizers
Proceedings of the international symposium on Code generation and optimization
Continuous Path and Edge Profiling
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Design of a two-level hot path detector for path-based loop optimizations
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
TotalProf: a fast and accurate retargetable source code profiler
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A hardware hot loop path detector for dynamic parallelization and optimization
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Trace-Based runtime instruction rescheduling for architecture extension
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Profiling all paths: A new profiling technique for both cyclic and acyclic paths
Journal of Systems and Software
CLAP: recording local executions to reproduce concurrency failures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
For aggressive path-based program optimizations to be profitable in cost-sensitive environments, accurate path profiles must be available at low overheads. In this paper, we propose a low-overhead, non-intrusive hardware path profiling scheme that can be programmed to detect several types of paths including acyclic, intra-procedural paths, paths for a Whole Program Path and extended paths. The profiler consists of a path stack, which detects paths and generates a sequence of path descriptors using branch information from the processor pipeline, and a hot path table that collects a profile of hot paths for later use by a program optimizer. With assistance from the processor's event detection logic, our profiler can track a host of architectural metrics along paths, enabling context-sensitive performance monitoring and bottleneck analysis. We illustrate the utility of our scheme by associating paths with a power metric that estimates power consumption in the cache hierarchy caused by instructions along the path. Experiments using programs from the SPEC CPU2000 benchmark suite show that our path profiler, occupying 7KB of hardware real-estate, collects accurate path profiles (average overlap of 88% with a perfect profile) at negligible execution time overheads (0.6% on average).