Software pipelining loops with conditional branches

Authors:
Mark G. Stoodley;Corinna G. Lee
Affiliations:
Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada;Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
Venue:
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Year:
1996

Citing 25
Cited 16

Highly concurrent scalar processing

Highly concurrent scalar processing
URPR—An extension of URCR for software pipelining

MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
GURPR*: a new global software pipelining algorithm

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A dynamic-programming technique for compacting loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
GPMB—software pipelining branch-intensive loops

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A compilation technique for software pipelining of loops with conditional jumps

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
GURPR—a method for global software pipelining

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Local Microcode Compaction Techniques

ACM Computing Surveys (CSUR)
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Perfect Pipelining: A New Loop Parallelization Technique

ESOP '88 Proceedings of the 2nd European Symposium on Programming
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers

Split-path enhanced pipeline scheduling for loops with control flows

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Properties of Rescheduling Size Invariance for Dynamic Rescheduling-Based VLIW Cross-Generation Compatibility

IEEE Transactions on Computers
Control Flow Regeneration for Software Pipelined Loops with Conditions

International Journal of Parallel Programming
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

IEEE Transactions on Computers
A finite state machine based format model of software pipelined loops with conditions

Progress in computer research
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Global Software Pipelining with Iteration Preselection

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Predicate-aware scheduling: a technique for reducing resource constraints

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
Predicated Software Pipelining Technique for Loops with Conditions

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Probabilistic Predicate-Aware Modulo Scheduling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compiler-Directed ILP Extraction for Clustered VLIW/EPIC Machines: Predication, Speculation and Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Complementing software pipelining with software thread integration

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A new register file access architecture for software pipelining in VLIW processors

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Compile-time and instruction-set methods for improving floating- to fixed-point conversion accuracy

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Software pipelining is an aggressive scheduling technique that generates efficient code for loops and is particularly effective for VLIW architectures. Few software pipelining algorithms, however, are able to efficiently schedule loops that contain conditional branches. We have developed an algorithm we call All Paths Pipelining (APP) that addresses this shortcoming of software pipelining. APP is designed to achieve optimal or near-optimal performance for any run of iterations while providing efficient code for transitioning between runs. A run is the execution of consecutive iterations that all execute the same path through a loop. APP accomplishes this by using techniques from modulo scheduling and kernel recognition algorithms, the two main approaches for software pipelining loops. We have implemented the APP algorithm in our research compiler and have evaluated its performance by executing its generated code on a VLIW instruction-set simulator. For a processor with five heterogeneous functional units, APP is able to add another 1% to 23% increase in performance over basic software pipelining by effectively pipelining loops with conditional branches.