Modulo scheduling for the TMS320C6x VLIW DSP architecture

Authors:
Eric Stotzer;Ernst Leiss
Affiliations:
Texas Instruments, PO Box 1443, MS 730 Houston, TX;University of Houston, Dept. of Computer Science, Houston, TX
Venue:
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Year:
1999

Citing 14
Cited 14

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Parallel and vector computing: a practical introduction

Parallel and vector computing: a practical introduction
Software pipelining

ACM Computing Surveys (CSUR)
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Realistic scheduling: compaction for pipelined architectures

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Improving the throughput of a pipeline by insertion of delays

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture

Communication scheduling

ACM SIGPLAN Notices
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
SPOT: development tool for software pipeline optimization for VLIW-DSPs used in real-time image processing

Real-Time Imaging - Special issue on software engineering
Complementing software pipelining with software thread integration

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler transformations for effectively exploiting a zero overhead loop buffer

Software—Practice & Experience
Generic software pipelining at the assembly level

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors

CC'07 Proceedings of the 16th international conference on Compiler construction
Compilation strategies for reducing code size on a VLIW processor with variable length instructions

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Digital Signal Processing (DSP) architectures are specialized for high performance numerical algorithms such as those found in communication and multimedia applications. The development of efficient compilers for DSP processors is a growing research area. The Texas Instruments TMS320C6x (C6x) is a Very Long Instruction Word (VLIW) DSP architecture capable of issuing eight operations in parallel. In this paper, we present the results of implementing a software pipelining algorithm for the C6x. We provide a description of the C6x and detail the architectural features that impact software pipelining such as a moderately sized register file, constraints on code size, homogeneous resources, and multiple assignment code. We discuss how we adapted modulo scheduling to implement software pipelining for the C6x. Finally, we present the results of modulo scheduling a set of 40 loop kernel benchmarks and measure the algorithm in terms of schedule quality and algorithm complexity.