Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Register allocation with instruction scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Decomposed software pipelining with reduced register requirement
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Allocating registers in multiple instruction-issuing processors
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Minimizing register requirements of a modulo schedule via optimum stage scheduling
International Journal of Parallel Programming
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
On a graph-theoretical model for cyclic register allocation
Discrete Applied Mathematics
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
IEEE Transactions on Computers
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Lattice-Based Memory Allocation
IEEE Transactions on Computers
Register saturation in instruction level parallelism
International Journal of Parallel Programming
On Periodic Register Need in Software Pipelining
IEEE Transactions on Computers
CC'07 Proceedings of the 16th international conference on Compiler construction
SCAN: a heuristic for near-optimal software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
SIRALINA: efficient two-steps heuristic for storage optimisation in single period task scheduling
Journal of Combinatorial Optimization
Hi-index | 0.00 |
This article treats register constraints in high performance codes and embedded VLIW computing, aiming to decouple register constraints from instruction scheduling. It extends the register saturation (RS) concept to periodic instruction schedules, i.e., software pipelining (SWP). We formally study an approach which consists in computing the exact upper-bound of the register need for all the valid SWP schedules of an innermost loop (without overestimation), independently of the functional unit constraints. We call this upper-limit the periodic register saturation (PRS) of the data dependence graph (DDG). PRS is well adapted to situations where spilling is not a favourite solution for reducing register pressure compared to ILP scheduling: spill operations request memory data with a higher energy consumption. Also, spill code introduces unpredictable cache effects and makes Worst-Case-Execution-Time (WCET) estimation less accurate. PRS is a computer science concept that has many possible applications. First, it provides compiler designers new ways to generate better codes regarding register optimisation by avoiding useless spilling. Second, its computation can help architectural designers to decide about the most suitable number of available registers inside an embedded VLIW processor; such architectural decision can be done with full respect to instruction level parallelism extraction, independently of the chosen functional units configuration. Third, it can be used to verify prior to instruction scheduling that a code does not require spilling. In this paper, we write an appropriate mathematical formalism for the problem of PRS computation and reduction in the case of loops where data dependence graphs may be cyclic. We prove that PRS reduction is NP-hard and we provide optimal and approximate methods. We have implemented our methods, and our experimental results demonstrate the practical usefulness and efficiency of the PRS approach.