Modulo scheduling for the TMS320C6x VLIW DSP architecture

  • Authors:
  • Eric Stotzer;Ernst Leiss

  • Affiliations:
  • Texas Instruments, PO Box 1443, MS 730 Houston, TX;University of Houston, Dept. of Computer Science, Houston, TX

  • Venue:
  • Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digital Signal Processing (DSP) architectures are specialized for high performance numerical algorithms such as those found in communication and multimedia applications. The development of efficient compilers for DSP processors is a growing research area. The Texas Instruments TMS320C6x (C6x) is a Very Long Instruction Word (VLIW) DSP architecture capable of issuing eight operations in parallel. In this paper, we present the results of implementing a software pipelining algorithm for the C6x. We provide a description of the C6x and detail the architectural features that impact software pipelining such as a moderately sized register file, constraints on code size, homogeneous resources, and multiple assignment code. We discuss how we adapted modulo scheduling to implement software pipelining for the C6x. Finally, we present the results of modulo scheduling a set of 40 loop kernel benchmarks and measure the algorithm in terms of schedule quality and algorithm complexity.