Rapid profiling via stratified sampling

Authors:
S. Subramanya Sastry;Rastislav Bodík;James E. Smith
Affiliations:
Computer Sciences Dept., University of Wisconsin-Madison;Computer Sciences Dept., University of Wisconsin-Madison;Dept. of ECE, University of Wisconsin-Madison
Venue:
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Year:
2001

Citing 33
Cited 21

Using profile information to assist classic code optimizations

Software—Practice & Experience
Profile-guided automatic inline expansion for C programs

Software—Practice & Experience
Partial evaluation and automatic program generation

Partial evaluation and automatic program generation
Optimally profiling and tracing programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Adaptive optimization for self: reconciling high performance with exploratory programming

Adaptive optimization for self: reconciling high performance with exploratory programming
Data specialization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
tcc: a system for fast, flexible, and high-level dynamic code generation

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The Jalapeño dynamic optimizing compiler for Java

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Adaptive optimization in the Jalapeño JVM

OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Relational profiling: enabling thread-level parallelism in virtual machines

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Calpa: a tool for automating selective dynamic compilation

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Efficient and flexible value sampling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Software profiling for hot path prediction: less is more

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The IA-64 Architecture at Work

Computer
Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches

ECOOP '91 Proceedings of the European Conference on Object-Oriented Programming
Code Specialization Based on Value Profiles

SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Exploiting Basic Block Value Locality with Block Reuse

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Trace-Level Reuse

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
A Programmable Co-processor for Profiling

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Online feedback-directed optimization of Java

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Online Subpath Profiling

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Dynamic trace selection using performance monitoring hardware sampling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Catching Accurate Profiles in Hardware

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Static Identification of Delinquent Loads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Phase-Aware Remote Profiling

Proceedings of the international symposium on Code generation and optimization
HPS: Hybrid Profiling Support

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Runtime specialization with optimistic heap analysis

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Profiling over Adaptive Ranges

Proceedings of the International Symposium on Code Generation and Optimization
Efficient remote profiling for resource-constrained devices

ACM Transactions on Architecture and Code Optimization (TACO)
Shadow Profiling: Hiding Instrumentation Costs with Parallelism

Proceedings of the International Symposium on Code Generation and Optimization
Design of a two-level hot path detector for path-based loop optimizations

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Formulating and implementing profiling over adaptive ranges

ACM Transactions on Architecture and Code Optimization (TACO)
Core monitors: monitoring performance in multicore processors

Proceedings of the 6th ACM conference on Computing frontiers
A hardware hot loop path detector for dynamic parallelization and optimization

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
On linewidth-based yield analysis for nanometer lithography

Proceedings of the Conference on Design, Automation and Test in Europe
DeFT: Design space exploration for on-the-fly detection of coherence misses

ACM Transactions on Architecture and Code Optimization (TACO)
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.