VHC: Quickly Building an Optimizer for Complex Embedded Architectures

Authors:
Michael Dupré;Nathalie Drach;Olivier Temam
Affiliations:
-;-;-
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2004

Citing 26
Cited 2

The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
Software trace cache

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
On the importance of points-to analysis and other memory disambiguation methods for C programs

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Transparent data-memory organizations for digital signal processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler Design Issues for Embedded Processors

IEEE Design & Test
Workload Design: Selecting Representative Program-Input Pairs

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Speculative Alias Analysis for Executable Code

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Effective Compilation Support for Variable Instruction Set Architecture

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
DELI: a new run-time control point

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Coupling on-line and off-line profile information to improve program performance

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Instruction Scheduling for Clustered VLIW DSPs

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Dynamically Scheduling VLIW Instructions with Dependency Information

INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures

Automatic instruction scheduler retargeting by reverse-engineering

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To meet the high demand for powerful embedded processors,VLIW architectures are increasingly complex (e.g.,multiple clusters), and moreover, they now run increasinglysophisticated control-intensive applications. As a result, developingarchitecture-specific compiler optimizations is becomingboth increasingly critical and complex, while time-to-market constraints remain very tight.In this article, we present a novel program optimizationapproach, called the Virtual Hardware Compiler (VHC),that can perform as well as static compiler optimizations,but which requires far less compiler development effort,even for complex VLIW architectures and complex targetapplications. The principle is to augment the target processorsimulator with superscalar-like features, observe howthe target program is dynamically optimized during execution,and deduce an optimized binary for the static VLIWarchitecture. Developing an architecture-specific optimizerthen amounts to modifying the processor simulator whichis very fast compared to adapting static compiler optimizationsto an architecture. We also show that a VHC-optimizedbinary trained on a number of data sets performs as wellas a statically-optimized binary on other test data sets. Theonly drawback of the approach is a largely increased compilationtime, which is often acceptable for embedded applicationsand devices. Using the Texas Instruments C62 VLIWprocessor and the associated compiler, we experimentallyshow that this approach performs as well as static compileroptimizations for a much lower research and developmenteffort. Using a single-core C60 and a dual-core clusteredC62 processors, we also show that the same approach canbe used for efficiently retargeting binary programs within afamily of processors.