Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Parallel and vector computing: a practical introduction
Parallel and vector computing: a practical introduction
ACM Computing Surveys (CSUR)
Modulo scheduling for the TMS320C6x VLIW DSP architecture
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Introducing the IA-64 Architecture
IEEE Micro
Data Flow and Dependence Analysis for Instruction Level Parallelism
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Reduced code size modulo scheduling in the absence of hardware support
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Improving the throughput of a pipeline by insertion of delays
ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
Partitioned Schedules for Clustered VLIW Architectures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
Hi-index | 0.00 |
This paper describes complementary software- and hardware-based approaches for handling overlapping register lifetimes that occur in modulo scheduled loops. Modulo scheduling takes the N-instructions in a loop body and constructs an M-stage software pipeline. The length of each stage in the software pipeline is the Initiation Interval (II), which is the rate at which new loop iterations are started. An overlapped lifetime has a live range longer than the II, and as a consequence, the current iteration writes a new value to a register before a previous loop iteration has fin-ished using the old value. Hardware and software solutions for dealing with overlapped lifetimes have been proposed by re-searchers and also implemented in commercial products. These solutions include rotating register files, register queues, modulo variable expansion, and post-scheduling live range splitting. Each of these approaches has drawbacks for embedded systems such as an increase in silicon area, power consumption, and code size. Our approach, which is an improvement to the current solutions, prevents overlapped lifetimes through a combination of hardware and software techniques. The hardware element of our approach implements a register assignment latency that allows multiple in-flight writes to be pending to the same register. The software element of our approach uses dependence analysis and a constrained modulo scheduling algorithm to prevent overlapped lifetimes. We describe how to use these hardware and software techniques during modulo scheduling. Finally, we present the results of using our approach to compile embedded application code and present results in terms of modulo schedule quality and application performance.