Modulo Scheduling with Reduced Register Pressure

Authors:
Josep Llosa;Mateo Valero;Eduard Ayguadé;Antonio González
Affiliations:
Univ. Politècnica de Catalunya, Barcelona, Spain;Univ. Politècnica de Catalunya, Barcelona, Spain;Univ. Politècnica de Catalunya, Barcelona, Spain;Univ. Politècnica de Catalunya, Barcelona, Spain
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 24
Cited 6

Highly concurrent scalar processing

Highly concurrent scalar processing
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Circular scheduling: a new technique to perform software pipelining

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
GURPR*: a new global software pipelining algorithm

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register requirements of pipelined processors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A novel framework of register allocation for software pipelining

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Decomposed software pipelining: a new perspective and a new approach

International Journal of Parallel Programming
Software pipelining

ACM Computing Surveys (CSUR)
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Systolic Array Optimizing Compiler

A Systolic Array Optimizing Compiler
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming

Widening resources: a cost-effective technique for aggressive ILP architectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An algorithm to improve parallelism in distributed systems using asynchronous calls

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics

Quantified Score

Hi-index	14.99

Visualization

Abstract

Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Modulo scheduling refers to a class of algorithms for software pipelining. Most previous research on modulo scheduling has focused on reducing the number of cycles between the initiation of consecutive iterations (which is termed II) but has not considered the effect of the register pressure of the produced schedules. The register pressure increases as the instruction level parallelism increases. When the register requirements of a schedule are higher than the available number of registers, the loop must be rescheduled perhaps with a higher II. Therefore, the register pressure has an important impact on the performance of a schedule. This paper presents a novel heuristic modulo scheduling strategy that tries to generate schedules with the lowest II, and, from all the possible schedules with such II, it tries to select that with the lowest register requirements. The proposed method has been implemented in an experimental compiler and has been tested for the Perfect Club benchmarks. The results show that the proposed method achieves an optimal II for at least 97.5 percent of the loops and its compilation time is comparable to a conventional top-down approach, whereas the register requirements are lower. In addition, the proposed method is compared with some other existing methods. The results indicate that the proposed method performs better than other heuristic methods and almost as well as linear programming methods, which obtain optimal solutions but are impractical for product compilers because their computing cost grows exponentially with the number of operations in the loop body.