Using profile information to assist classic code optimizations
Software—Practice & Experience
Profile-guided automatic inline expansion for C programs
Software—Practice & Experience
Partial evaluation and automatic program generation
Partial evaluation and automatic program generation
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Adaptive optimization for self: reconciling high performance with exploratory programming
Adaptive optimization for self: reconciling high performance with exploratory programming
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Accurate and practical profile-driven compilation using the profile buffer
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
tcc: a system for fast, flexible, and high-level dynamic code generation
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The Jalapeño dynamic optimizing compiler for Java
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Compiler-directed dynamic computation reuse: rationale and initial results
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Adaptive optimization in the Jalapeño JVM
OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Relational profiling: enabling thread-level parallelism in virtual machines
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Calpa: a tool for automating selective dynamic compilation
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Efficient and flexible value sampling
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Software profiling for hot path prediction: less is more
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The IA-64 Architecture at Work
Computer
Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches
ECOOP '91 Proceedings of the European Conference on Object-Oriented Programming
Code Specialization Based on Value Profiles
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Exploiting Basic Block Value Locality with Block Reuse
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
A Programmable Co-processor for Profiling
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Online feedback-directed optimization of Java
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Static Identification of Delinquent Loads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the international symposium on Code generation and optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Runtime specialization with optimistic heap analysis
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
Efficient remote profiling for resource-constrained devices
ACM Transactions on Architecture and Code Optimization (TACO)
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
Design of a two-level hot path detector for path-based loop optimizations
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
Core monitors: monitoring performance in multicore processors
Proceedings of the 6th ACM conference on Computing frontiers
A hardware hot loop path detector for dynamic parallelization and optimization
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
On linewidth-based yield analysis for nanometer lithography
Proceedings of the Conference on Design, Automation and Test in Europe
DeFT: Design space exploration for on-the-fly detection of coherence misses
ACM Transactions on Architecture and Code Optimization (TACO)
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.