Merging Head and Tail Duplication for Convergent Hyperblock Formation

Authors:
Bertrand A. Maher;Aaron Smith;Doug Burger;Kathryn S. McKinley
Affiliations:
University of Texas at Austin;University of Texas at Austin;University of Texas at Austin;University of Texas at Austin
Venue:
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2006

Citing 18
Cited 4

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Using profile information to assist classic code optimizations

Software—Practice & Experience
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Resource-Constrained Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Speculative hedge: regulating compile-time speculation against profile variations

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting instruction level parallelism in the presence of conditional branches

Exploiting instruction level parallelism in the presence of conditional branches
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

International Journal of Parallel Programming
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Treegion Scheduling for Wide Issue Processors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Scaling to the End of Silicon with EDGE Architectures

Computer
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Tree compaction of microprograms

ACM SIGMICRO Newsletter
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture

Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Feature selection and policy optimization for distributed instruction placement using reinforcement learning

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Register Bank Assignment for Spatially Partitioned Processors

Languages and Compilers for Parallel Computing
An evaluation of the TRIPS computer system

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

VLIW and EDGE (Explicit Data Graph Execution) ar- chitectures rely on compilers to form high-quality hyper- blocks for good performance. These compilers typically perform hyperblock formation, loop unrolling, and scalar optimizations in a fixed order. This approach limits the compiler's ability to exploit or correct interactions among these phases. EDGE architectures exacerbate this problem by imposing structural constraints on hyperblocks, such as instruction count and instruction composition. This paper presents convergent hyperblock formation, which iteratively applies if-conversion, peeling, unrolling, and scalar optimizations until converging on hyperblocks that are as close as possible to the structural constraints. To perform peeling and unrolling, convergent hyperblock formation generalizes tail duplication, which removes side entrances to acyclic traces, to remove back edges into cyclic traces using head duplication. Simulation results for an EDGE architecture show that convergent hyperblock formation improves code quality over discrete-phase ap- proaches with heuristics for VLIW and EDGE. This algo- rithm offers a solution to hyperblock phase ordering prob- lems and can be configured to implement a wide range of policies.