Early control of register pressure for software pipelined loops

Authors:
Sid-Ahmed-Ali Touati;Christine Eisenbeis
Affiliations:
INRIA Rocquencourt, Le Chesnay, France;INRIA Rocquencourt, Le Chesnay, France
Venue:
CC'03 Proceedings of the 12th international conference on Compiler construction
Year:
2003

Citing 14
Cited 2

Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A novel framework of register allocation for software pipelining

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Decomposed software pipelining with reduced register requirement

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Minimizing register requirements of a modulo schedule via optimum stage scheduling

International Journal of Parallel Programming
Schedule-independent storage mapping for loops

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
On a graph-theoretical model for cyclic register allocation

Discrete Applied Mathematics
Optimal acyclic fine-grain scheduling with cache effects for embedded and real time systems

Proceedings of the ninth international symposium on Hardware/software codesign
A unified framework for schedule and storage optimization

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
RESIS: A New Methodology for Register Optimization in Software Pipelining

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs

CC '92 Proceedings of the 4th International Conference on Compiler Construction
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Code-size conscious pipelining of imperfectly nested loops

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Software Pipelining in Nested Loops with Prolog-Epilog Merging

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The register allocation in loops is generally performed after or during the software pipelining process. This is because doing a conventional register allocation at first step without assuming a schedule lacks the information of interferences between variable lifetime intervals. Thus, the register allocator may introduce an excessive amount of false dependences that reduce dramatically the ILP (Instruction Level Parallelism). We present a new framework for controlling the register pressure before software pipelining. This is based on inserting some anti-dependences edges (register reuse edges) labeled with reuse distances, directly on the data dependence graph. In this new graph, we are able to guarantee that the number of simultaneously alive variables in any schedule does not exceed a limit. The determination of register and distance reuse is parameterized by the desired critical circuit ratio (MII) as well as by the register pressure constraints - either can be minimized while the other one is fixed. After scheduling, register allocation is done cyclically on conventional register sets or on rotating register files. We give an optimal exact model, and another approximative one that generalizes the Ning-Gao [13] buffer optimization heuristics.