Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Characterizing computer performance with a single number
Communications of the ACM
Global value numbers and redundant computations
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Region Scheduling: An Approach for Detecting and Redistributing Parallelism
IEEE Transactions on Software Engineering
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Code duplication: an assist for global instruction scheduling
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Using profile information to assist classic code optimizations
Software—Practice & Experience
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Program analysis and optimization for machines with instruction cache
Program analysis and optimization for machines with instruction cache
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
A global resource-constrained parallelization technique
ICS '89 Proceedings of the 3rd international conference on Supercomputing
An Algorithm for Structuring Flowgraphs
Journal of the ACM (JACM)
Global optimization by suppression of partial redundancies
Communications of the ACM
Parallel processing: a smart compiler and a dumb machine
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Guidebook to FORTRAN on Supercomputers
A Guidebook to FORTRAN on Supercomputers
Percolation Scheduling: A Parallel Compilation Technique
Percolation Scheduling: A Parallel Compilation Technique
ACM Computing Surveys (CSUR)
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Reducing code size in VLIW instruction scheduling
Journal of Embedded Computing - Low-power Embedded Systems
Optimal trace scheduling using enumeration
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Hi-index | 0.00 |
Trace scheduling is an optimization technique that selects a sequence of basic blocks as a trace and schedules the operations from the trace together. If an operation is moved across basic block boundaries, one or more compensation copies may be required in the off-trace code. This article discusses the generation of compensation code in a trace scheduling compiler and presents techniques for limiting the amount of compensation code: avoidance (restricting code motion so that no compensation code is required) and suppression (analyzing the global flow of the program to detect when a copy is redundant). We evaluate the effectiveness of these techniques based on measurements for the SPEC89 suite and the Livermore Fortran Kernels, using our implementation of trace scheduling for a Multiflow Trace 7/300. The article compares different compiler models contrasting the performance of trace scheduling with the performance obtained from typical RISC compilation techniques.There are two key results of this study. First, the amount of compensation code generated is not large. For the SPEC89 suite, the average code size increase due to trace scheduling is 6%. Avoidance is more important than suppression, although there are some kernels that benefit significantly from compensation code suppression. Since compensation code is not a major issue, a compiler can be more aggressive in code motion and loop unrolling. Second, compensation code is not critical to obtain the benefits of trace scheduling. Our implementation of trace scheduling improves the SPEC mark rating by 30% over basic block scheduling, but restricting trace scheduling so that no compensation code is required improves the rating by 25%. This indicates that most basic block scheduling techniques can be extended to trace scheduling without requiring any complicated compensation code bookkeeping.