Modulo scheduling without overlapped lifetimes

Authors:
Eric J. Stotzer;Ernst L. Leiss
Affiliations:
Texas Instruments, Houston, TX, USA;University of Houston, Houston, TX, USA
Venue:
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2009

Citing 20
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Parallel and vector computing: a practical introduction

Parallel and vector computing: a practical introduction
Software pipelining

ACM Computing Surveys (CSUR)
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Evaluating the Use of Register Queues in Software Pipelined Loops

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Modulo schedule buffers

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Introducing the IA-64 Architecture

IEEE Micro
Data Flow and Dependence Analysis for Instruction Level Parallelism

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Improving the throughput of a pipeline by insertion of delays

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Swing Modulo Scheduling: A Lifetime-Sensitive Approach

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems

Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes complementary software- and hardware-based approaches for handling overlapping register lifetimes that occur in modulo scheduled loops. Modulo scheduling takes the N-instructions in a loop body and constructs an M-stage software pipeline. The length of each stage in the software pipeline is the Initiation Interval (II), which is the rate at which new loop iterations are started. An overlapped lifetime has a live range longer than the II, and as a consequence, the current iteration writes a new value to a register before a previous loop iteration has fin-ished using the old value. Hardware and software solutions for dealing with overlapped lifetimes have been proposed by re-searchers and also implemented in commercial products. These solutions include rotating register files, register queues, modulo variable expansion, and post-scheduling live range splitting. Each of these approaches has drawbacks for embedded systems such as an increase in silicon area, power consumption, and code size. Our approach, which is an improvement to the current solutions, prevents overlapped lifetimes through a combination of hardware and software techniques. The hardware element of our approach implements a register assignment latency that allows multiple in-flight writes to be pending to the same register. The software element of our approach uses dependence analysis and a constrained modulo scheduling algorithm to prevent overlapped lifetimes. We describe how to use these hardware and software techniques during modulo scheduling. Finally, we present the results of using our approach to compile embedded application code and present results in terms of modulo schedule quality and application performance.