A Programmable Hardware Path Profiler

Authors:
Kapil Vaswani;Matthew J. Thazhuthaveetil;Y. N. Srikant
Affiliations:
Indian Institute of Science, Bangalore;Indian Institute of Science, Bangalore;Indian Institute of Science, Bangalore
Venue:
Proceedings of the international symposium on Code generation and optimization
Year:
2005

Citing 26
Cited 11

The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Optimally profiling and tracing programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic path-based branch correlation

Proceedings of the 28th annual international symposium on Microarchitecture
Hardware-based profiling: an effective technique for profile-driven optimization

International Journal of Parallel Programming
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Analytical energy dissipation models for low-power caches

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving data-flow analysis with path profiles

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Better global scheduling using path profiles

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Whole program paths

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
vEC: virtual energy counters

PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Pentium 4 Performance-Monitoring Features

IEEE Micro
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Path Profile Guided Partial Redundancy Elimination Using Speculation

ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Path-based compilation

Path-based compilation
An Efficient Online Path Profiling Framework for Java Just-In-Time Compilers

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Extending Path Profiling across Loop Backedges and Procedure Boundaries

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters

Practical Path Profiling for Dynamic Optimizers

Proceedings of the international symposium on Code generation and optimization
Continuous Path and Edge Profiling

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Introspective 3D chips

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Shadow Profiling: Hiding Instrumentation Costs with Parallelism

Proceedings of the International Symposium on Code Generation and Optimization
Binary synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Design of a two-level hot path detector for path-based loop optimizations

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
TotalProf: a fast and accurate retargetable source code profiler

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A hardware hot loop path detector for dynamic parallelization and optimization

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Trace-Based runtime instruction rescheduling for architecture extension

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Profiling all paths: A new profiling technique for both cyclic and acyclic paths

Journal of Systems and Software
CLAP: recording local executions to reproduce concurrency failures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

For aggressive path-based program optimizations to be profitable in cost-sensitive environments, accurate path profiles must be available at low overheads. In this paper, we propose a low-overhead, non-intrusive hardware path profiling scheme that can be programmed to detect several types of paths including acyclic, intra-procedural paths, paths for a Whole Program Path and extended paths. The profiler consists of a path stack, which detects paths and generates a sequence of path descriptors using branch information from the processor pipeline, and a hot path table that collects a profile of hot paths for later use by a program optimizer. With assistance from the processor's event detection logic, our profiler can track a host of architectural metrics along paths, enabling context-sensitive performance monitoring and bottleneck analysis. We illustrate the utility of our scheme by associating paths with a power metric that estimates power consumption in the cache hierarchy caused by instructions along the path. Experiments using programs from the SPEC CPU2000 benchmark suite show that our path profiler, occupying 7KB of hardware real-estate, collects accurate path profiles (average overlap of 88% with a perfect profile) at negligible execution time overheads (0.6% on average).