The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Accurate and practical profile-driven compilation using the profile buffer
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
On the importance of points-to analysis and other memory disambiguation methods for C programs
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Transparent data-memory organizations for digital signal processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler Design Issues for Embedded Processors
IEEE Design & Test
Workload Design: Selecting Representative Program-Input Pairs
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Speculative Alias Analysis for Executable Code
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Effective Compilation Support for Variable Instruction Set Architecture
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
DELI: a new run-time control point
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Coupling on-line and off-line profile information to improve program performance
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compiler optimization-space exploration
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Instruction Scheduling for Clustered VLIW DSPs
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Dynamically Scheduling VLIW Instructions with Dependency Information
INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Automatic instruction scheduler retargeting by reverse-engineering
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
To meet the high demand for powerful embedded processors,VLIW architectures are increasingly complex (e.g.,multiple clusters), and moreover, they now run increasinglysophisticated control-intensive applications. As a result, developingarchitecture-specific compiler optimizations is becomingboth increasingly critical and complex, while time-to-market constraints remain very tight.In this article, we present a novel program optimizationapproach, called the Virtual Hardware Compiler (VHC),that can perform as well as static compiler optimizations,but which requires far less compiler development effort,even for complex VLIW architectures and complex targetapplications. The principle is to augment the target processorsimulator with superscalar-like features, observe howthe target program is dynamically optimized during execution,and deduce an optimized binary for the static VLIWarchitecture. Developing an architecture-specific optimizerthen amounts to modifying the processor simulator whichis very fast compared to adapting static compiler optimizationsto an architecture. We also show that a VHC-optimizedbinary trained on a number of data sets performs as wellas a statically-optimized binary on other test data sets. Theonly drawback of the approach is a largely increased compilationtime, which is often acceptable for embedded applicationsand devices. Using the Texas Instruments C62 VLIWprocessor and the associated compiler, we experimentallyshow that this approach performs as well as static compileroptimizations for a much lower research and developmenteffort. Using a single-core C60 and a dual-core clusteredC62 processors, we also show that the same approach canbe used for efficiently retargeting binary programs within afamily of processors.