Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sentinel scheduling: a model for compiler-controlled speculative execution
ACM Transactions on Computer Systems (TOCS)
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Structure of Computers and Computations
Structure of Computers and Computations
Data Flow and Dependence Analysis for Instruction Level Parallelism
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction)
Optimum modulo schedules for minimum register requirements
ICS '95 Proceedings of the 9th international conference on Supercomputing
Critical path reduction for scalar programs
Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation
Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code
Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A framework for balancing control flow and predication
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication
International Journal of Parallel Programming
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Path Analysis and Renaming for Predicated Instruction Scheduling
International Journal of Parallel Programming
Optimization of Machine Descriptions for Efficient Use
International Journal of Parallel Programming
Hybrid Predication Model for Instruction Level Parallelism
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hi-index | 0.00 |
The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. While height reduction techniques are not always helpful, their utility can be demonstrated in a broad range of important situations.This paper focuses on the height reduction of control recurrences within loops with data dependent exits. Loops with exits are transformed so as to alleviate performance bottlenecks resulting from control dependences. A compilation approach to effect these transformations is described. The techniques presented in this paper used in combination with prior work on reducing the height of data dependences provide a comprehensive approach to accelerating loops with conditional exits.In many cases, loops with conditional exits provide a degree of parallelism traditionally associated with vectorization. Multiple iterations of a loop can be retired in a single cycle on a processor with adequate instruction level parallelism with no cost in code redundancy. In more difficult cases, height reduction requires redundant computation or may not be feasible.