Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

Authors:
Elana Granston;Eric Stotzer;Joe Zbiciak
Affiliations:
Texas Instrument Incorporated, Houston, TX;Texas Instrument Incorporated, Houston, TX;Texas Instrument Incorporated, Houston, TX
Venue:
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Year:
2001

Citing 8
Cited 0

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) loops. More recently, software pipelining has been extended to irregular (WHILE) loops, but only on architectures that provide special-purpose hardware such as rotating (predicate and general-purpose) register files, specific instructions for filling/draining software pipelined loops, and possibly hardware support for speculative code motion. In contrast, the TMS320C6000 family has a limited, static register file and no specialized hardware beyond the ability to predicate instructions using a few static registers. In this paper, we describe our experience extending a production compiler for the TMS320C6000 family to software pipeline irregular loops. We discuss our technique for preprocessing irregular loops so that they can be handled by the existing software pipeliner. Our approach is much simpler than previous approaches and works very well in the presence of the DSP applications and the target architecture which characterize our environment. With this optimization, we achieve impressive speedups on several key DSP and non-DSP algorithms.