Software pipelining showdown: optimal vs. heuristic methods in a production compiler

Authors:
John Ruttenberg;G. R. Gao;A. Stoutchinin;W. Lichtenstein
Affiliations:
Silicon Graphics Inc., 2011 N. Shoreline Blvd., Mountain View, CA;McGill University - School of Computer Science, 3480 University St., McConnell Building, Room 318, Montreal, Canada H3A2A7;McGill University - School of Computer Science, 3480 University St., McConnell Building, Room 318, Montreal, Canada H3A2A7;Silicon Graphics Inc., 2011 N. Shoreline Blvd., Mountain View, CA
Venue:
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Year:
1996

Citing 28
Cited 37

Integer and combinatorial optimization

Integer and combinatorial optimization
Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Coloring heuristics for register allocation

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A novel framework of register allocation for software pipelining

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Register allocation via graph coloring

Register allocation via graph coloring
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Designing the TFP Microprocessor

IEEE Micro
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Scheduling and mapping: software pipelining in the presence of structural hazards

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal software pipelining with function unit and register constraints

Optimal software pipelining with function unit and register constraints
A Fortran compiler for the FPS-164 scientific computer

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Systolic Array Optimizing Compiler

A Systolic Array Optimizing Compiler
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Loop Storage Optimization for Dataflow Machines

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Fine-Grain Scheduling under Resource Constraints

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
A Framework for Resource-Constrained Rate-Optimal Software Pipelining

CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Automatic Data Layout Using 0-1 Integer Programming

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Efficient Algorithms for Cyclic Scheduling

Efficient Algorithms for Cyclic Scheduling

Code reuse in an optimizing compiler

Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient formulation for optimal modulo schedulers

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
RECOD: a retiming heuristic to optimize resource and memory utilization in HW/SW codesigns

Proceedings of the 6th international workshop on Hardware/software codesign
Optimal Modulo Scheduling Through Enumeration

International Journal of Parallel Programming
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Using profiling to reduce branch misprediction costs on a dynamically scheduled processor

Proceedings of the 14th international conference on Supercomputing
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Compiling with code-size constraints

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
On achieving balanced power consumption in software pipelined loops

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Combining Loop Transformations Considering Caches and Scheduling

International Journal of Parallel Programming
Compilers for Instruction-Level Parallelism

Computer
PROPAN: A Retargetable System for Postpass Optimisations and Analyses

LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Speculative Prefetching of Induction Pointers

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Selective Guarded Execution Using Profiling on a Dynamically Scheduled Processor

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Efficient spill code for SDRAM

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Compiling with code-size constraints

ACM Transactions on Embedded Computing Systems (TECS)
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
SPOT: development tool for software pipeline optimization for VLIW-DSPs used in real-time image processing

Real-Time Imaging - Special issue on software engineering
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Differential register allocation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Demystifying on-the-fly spill code

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
A global progressive register allocator

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Allocating architected registers through differential encoding

ACM Transactions on Programming Languages and Systems (TOPLAS)
Resource aware mapping on coarse grained reconfigurable arrays

Microprocessors & Microsystems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

The Journal of Supercomputing
Synergistic execution of stream programs on multicores with accelerators

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
MIRS: modulo scheduling with integrated register spilling

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

ACM Transactions on Embedded Computing Systems (TECS)
Register pressure in software-pipelined loop nests: fast computation and impact on architecture design

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Allocating rotating registers by scheduling

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Just-In-Time Software Pipelining

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is a scientific comparison of two code generation techniques with identical goals --- generation of the best possible software pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of software pipelines pioneered by Rau and Glaser [RaG181], but are otherwise quite dissimilar.One technique was developed at Silicon Graphics and is used in the MIPSpro compiler. This is the production compiler for SGI's systems which are based on the MIPS R8000 processor [Hsu94]. It is essentially a branch--and--bound enumeration of possible schedules with extensive pruning. This method is heuristic because of the way it prunes and also because of the interaction between register allocation and scheduling.The second technique aims to produce optimal results by formulating the scheduling and register allocation problem as an integrated integer linear programming (ILP1) problem. This idea has received much recent exposure in the literature [AlGoGa95, Feautrier94, GoAlGa94a, GoAlGa94b, Eichenberger95], but to our knowledge all previous implementations have been too preliminary for detailed measurement and evaluation. In particular, we believe this to be the first published measurement of runtime performance for ILP based generation of software pipelines.A particularly valuable result of this study was evaluation of the heuristic pipelining technology in the SGI compiler. One of the motivations behind the McGill research was the hope that optimal software pipelining, while not in itself practical for use in production compilers, would be useful for their evaluation and validation. Our comparison has indeed provided a quantitative validation of the SGI compiler's pipeliner, leading us to increased confidence in both techniques.