Catching Accurate Profiles in Hardware

Authors:
Satish Narayanasamy;Timothy Sherwood;Suleyman Sair;Brad Calder;George Varghese
Affiliations:
-;-;-;-;-
Venue:
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Year:
2003

Citing 20
Cited 10

ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A hardware mechanism for dynamic extraction and relayout of program hot spots

Proceedings of the 27th annual international symposium on Computer architecture
Relational profiling: enabling thread-level parallelism in virtual machines

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value locality and value-centric data cache design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Efficient and flexible value sampling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Rapid profiling via stratified sampling

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Code Specialization Based on Value Profiles

SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Pointer cache assisted prefetching

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Efficient JavaVM Just-in-Time Compilation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
A Programmable Co-processor for Profiling

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Profiling soft-core processor applications for hardware/software partitioning

Journal of Systems Architecture: the EUROMICRO Journal
Profiling over Adaptive Ranges

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic Standby Prediction for Leakage Tolerant Microprocessor Functional Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Design of a two-level hot path detector for path-based loop optimizations

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Formulating and implementing profiling over adaptive ranges

ACM Transactions on Architecture and Code Optimization (TACO)
A comparative study between static and dynamic sleep signal generation techniques for leakage tolerant designs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A systematic approach to profiling for hardware/software partitioning

Computers and Electrical Engineering
A hardware hot loop path detector for dynamic parallelization and optimization

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Modeling conservative updates in multi-hash approximate count sketches

Proceedings of the 24th International Teletraffic Congress

Quantified Score

Hi-index	0.00

Visualization

Abstract

Run-time optimization is one of the most important ways of getting performance out of modern processors. Techniques such as prefetching, trace caching, memory disambiguationetc., are all based upon the principle of observation followed by adaptation, and all make use of some sort of profile information gathered at run-time. Programs are very complex, and the real trick in generating useful run-time profiles is sifting through all the unimportant and infrequently occurring events to find those that are important enough to warrant optimization.In this paper, we present the Multi-Hash architecture to catch important events even in the presence of extensive noise. Multi-hash uses a small amount of area, between 7 to 16 Kilo-bytes, to accurately capture these important events in hardware, without requiring any software support. This is achieved using multiple hash tables for the filtering, and interval-based profiling to help identify how important an event is in relationship to all the other events. We evaluate our design for value and edge profiling, and show that over a set of benchmarks, we get an average error less than 1%.