A novel framework of register allocation for software pipelining
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Resource-Constrained Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
A timed Petri-net model for fine-grain loop scheduling
CASCON '91 Proceedings of the 1991 conference of the Centre for Advanced Studies on Collaborative research
Register allocation for optimal loop scheduling
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Hi-index | 0.00 |
We present a transformational system for extracting parallelism from programs. Our transformations generate code for synchronous parallel computers, such as Very Long Instruction Word and pipelined machines. The transformational system, which is based on percolation scheduling,is simple and uniform. There are four primitive transformations-three that perform code motion plus loop unrolling-from which all parallelizing algorithms are constructed. Our transformations are studied as a formal system. We define a formal measure of program improvement, and show that our transformations improve programs with respect to the measure. This formal approach allows a number of results on the expressive power of our transformations. Most importantly, we show that it is possible to compute limits of infinite sequences of the primitive transformations. This leads to a number of new algorithms for software pipelining, including: an algorithm that generates optimal code for loops without tests, an algorithm for software pipelining of multiple nested loops, and a general solution to the problem of software pipelining in the presence of tests. Using the four primitives and the limit-taking transformation, it is possible to express the classical parallelization techniques for vector, multiprocessor, and VLIW machines, such as doacross, the wavefront method, loop interchange, trace scheduling, and a simple form of vectorization. Thus, our transformational system can be viewed as a formal foundation for the area of parallelization.