Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

Authors:
Suhyun Kim;Soo-Mook Moon;Jinpyo Park;Kemal Ebcioglu
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Computers
Year:
2002

Citing 19
Cited 4

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Circular scheduling: a new technique to perform software pipelining

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improvements to graph coloring register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining

ACM Computing Surveys (CSUR)
Resource-Constrained Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Iterated register coalescing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code

ICS '97 Proceedings of the 11th international conference on Supercomputing
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Split-path enhanced pipeline scheduling for loops with control flows

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data Flow and Dependence Analysis for Instruction Level Parallelism

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Property-Oriented Expansion

SAS '96 Proceedings of the Third International Symposium on Static Analysis
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Optimistic Register Coalescing

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques

Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
Time optimal software pipelining of loops with control flows

International Journal of Parallel Programming
Optimistic register coalescing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Rotating register allocation with multiple rotating branches

Proceedings of the 22nd annual international conference on Supercomputing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Enhanced pipeline scheduling (EPS) is a software pipelining technique which can achieve a variable initiation interval (II) for loops with control flow via its code motion pipelining. EPS, however, leaves behind many renaming copy instructions that cannot be coalesced due to interferences. These copies take resources and, more seriously, they may cause a stall if they rename a multilatency instruction whose latency is longer than the II aimed for by EPS. This paper proposes a code transformation technique based on loop unrolling which makes those copies coalescible. Two unique features of the technique are its method of determining the precise unroll amount, based on an idea of extended live ranges, and its insertion of special bookkeeping copies at loop exits. The proposed technique enables EPS to avoid a serious slowdown from latency handling and resource pressure, while keeping its variable II and other advantages. In fact, renaming through copies, followed by unroll-based copy elimination, is EPS's solution to the cross-iteration register overwrite problem in software pipelining. It works for loops with arbitrary control flow that EPS must deal with, as well as for straightline loops. Our empirical study performed on a VLIW testbed with a two-cycle load latency shows that 86 percent of the otherwise uncoalescible copies in innermost loops become coalescible when unrolled 2.2 times on average. In addition, it is demonstrated that the unroll amount obtained is precise and the most efficient. The unrolled version of the VLIW code includes fewer no-op VLIWs caused by stalls, improving the performance by a geometric mean of 18 percent on a 16-ALU machine.