Post-pass periodic register allocation to minimise loop unrolling degree

Authors:
Mounira Bachir;Sid-Ahmed-Ali Touati;Albert Cohen
Affiliations:
INRIA Saclay, Ile de France, France;University of Versailles Saint-Quentin-en-Yvelines, Ile de France, France;INRIA Saclay, Ile de France, France
Venue:
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2008

Citing 8
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The meeting graph: a new model for loop cyclic register allocation

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
On a graph-theoretical model for cyclic register allocation

Discrete Applied Mathematics
Register Allocation, Renaming and Their Impact on Fine-Grain Parallelism

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A Unified Software Pipeline Construction Scheme for Modulo Scheduled Loops

PaCT '97 Proceedings of the 4th International Conference on Parallel Computing Technologies
A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs

CC '92 Proceedings of the 4th International Conference on Compiler Construction

Using the meeting graph framework to minimise kernel loop unrolling for scheduled loops

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper solves an open problem regarding loop unrolling after periodic register allocation. Although software pipelining is a powerful technique to extract fine-grain parallelism, it generates reuse circuits spanning multiple loop iterations. These circuits require periodic register allocation, which in turn yield a code generation challenge, generally addressed through: (1) hardware support --- rotating register files --- deemed too expensive for embedded processors, (2) insertion of register moves with a high risk of reducing the computation throughput --- initiation interval (II) --- of software pipelining, and (3) post-pass loop unrolling that does not compromise throughput but often leads to unpractical code growth. The latter approach relies on the proof that MAXLIVE registers are sufficient for periodic register allocation (2; 3; 5); yet the only heuristic to control the amount of post-pass loop unrolling does not achieve this bound and leads to undesired register spills (4; 7). We propose a periodic register allocation technique allowing a software-only code generation that does not trade the optimality of the II for compactness of the generated code. Our idea is based on using the remaining registers: calling Rarch the number of architectural registers of the target processor, then the number of remaining registers that can be used for minimising the unrolling degree is equal to Rarch-MAXLIVE. We provide a complete formalisation of the problem and algorithm, followed by extensive experiments. We achieve practical loop unrolling degrees in most cases --- with no increase of the II --- while state-of-the-art techniques would either induce register spilling, degrade the II or lead to unacceptable code growth.