MIRS: modulo scheduling with integrated register spilling

Authors:
Javier Zalamea;Josep Llosa;Eduard Ayguadé;Mateo Valero
Affiliations:
Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya
Venue:
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Year:
2001

Citing 25
Cited 5

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Spill code minimization techniques for optimizing compliers

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Coloring heuristics for register allocation

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Register allocation via hierarchical graph coloring

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Circular scheduling: a new technique to perform software pipelining

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register requirements of pipelined processors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Improvements to graph coloring register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining with register allocation and spilling

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Register-Sensitive Software Pipelining

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Swing Modulo Scheduling: A Lifetime-Sensitive Approach

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family

Computer

Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Demystifying on-the-fly spill code

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
High-performance and low-power VLIW cores for numerical computations

International Journal of High Performance Computing and Networking
Register loading via linear programming

WADS'11 Proceedings of the 12th international conference on Algorithms and data structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

The overlapping of loop iterations in software pipelining techniques imposes high register requirements. The schedule for a loop is valid if it requires at most the number of registers available in the target architecture. Otherwise its register requirements have to be reduced by spilling registers to memory. Previous proposals for spilling in software pipelined loops require a two-step process. The first step performs the actual instruction scheduling without register constraints. The second step adds (if required) spill code and reschedules the modified loop. The process is repeated until a valid schedule, requiring no more registers than those available, is found. The paper presents MIRS (Modulo scheduling with Integrated Register Spilling), a novel register-constrained modulo scheduler that performs modulo scheduling and register spilling simultaneously in a single step. The algorithm is iterative and uses backtracking to undo previous scheduling decisions whenever resource or dependence conflicts appear. MIRS is compared against a state-of-the-art two-step approach already described in the literature. For this purpose, a workbench composed of a large set of loops from the Perfect Club and a set of processor configurations are used. On the average, for the loops that require spill code a speed-up in the range 14-31% and a reduction of the memory traffic by a factor in the range 0.90-0.72 are achieved.