Software pipelining: an effective scheduling technique for VLIW machines

Authors:
Monica S. Lam
Affiliations:
Stanford University
Venue:
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Year:
2004

Citing 32
Cited 2

Compilation for a high-performance systolic array

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Highly concurrent scalar processing

Highly concurrent scalar processing
URPR—An extension of URCR for software pipelining

MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
A study of scalar compilation techniques for pipelined supercomputers

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
The warp computer: Architecture, implementation, and performance

IEEE Transactions on Computers
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Compiler optimizations for asynchronous systolic array programs

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A novel framework of register allocation for software pipelining

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A compilation technique for software pipelining of loops with conditional jumps

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
GURPR—a method for global software pipelining

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Algorithm 97: Shortest path

Communications of the ACM
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Fortran compiler for the FPS-164 scientific computer

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
EPIC: Explicitly Parallel Instruction Computing

Computer
Perfect Pipelining: A New Loop Parallelization Technique

ESOP '88 Proceedings of the 2nd European Symposium on Programming
Fine-Grain Scheduling under Resource Constraints

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Global optimization of microprograms through modular control constructs

MICRO 12 Proceedings of the 12th annual workshop on Microprogramming
Improving the throughput of a pipeline by insertion of delays

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
An improvement of trace scheduling for global microcode compaction

MICRO 17 Proceedings of the 17th annual workshop on Microprogramming
The optimization of horizontal microcode within and beyond basic blocks: an application of processor scheduling with resources

The optimization of horizontal microcode within and beyond basic blocks: an application of processor scheduling with resources
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)

Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)
A systolic array optimizing compiler

A systolic array optimizing compiler

On minimizing register usage of linearly scheduled algorithms with uniform dependencies

Computer Languages, Systems and Structures
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The basic idea behind software pipelining was first developed by Patel and Davidson for scheduling hardware pipe-lines. As instruction-level parallelism made its way into general-purpose computing, it became necessary to automate scheduling. How and whether instructions can be scheduled statically have major ramifications on the design of computer architectures. Rau and Glaeser were the first to use software pipelining in a compiler for a machine with specialized hardware designed to support software pipelining. In the meantime, trace scheduling was touted to be the scheduling technique of choice for VLIW (Very Long Instruction Word) machines. The most important contribution from this paper is to show that software pipelining is effective on VLIW machines without complicated hardware support. Our understanding of software pipelining subsequently deepened with the work of many others. And today, software pipelining is used in all advanced compilers for machines with instruction-level parallelism, none of which, except the Intel Itanium, relies on any specialized support for software pipelining.This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.