Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Selected papers of the second workshop on Languages and compilers for parallel computing
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Parallel and vector computing: a practical introduction
Parallel and vector computing: a practical introduction
ACM Computing Surveys (CSUR)
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Realistic scheduling: compaction for pipelined architectures
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Improving the throughput of a pipeline by insertion of delays
ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
ACM SIGPLAN Notices
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Affinity-based cluster assignment for unrolled loops
ICS '02 Proceedings of the 16th international conference on Supercomputing
Real-Time Imaging - Special issue on software engineering
Complementing software pipelining with software thread integration
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler transformations for effectively exploiting a zero overhead loop buffer
Software—Practice & Experience
Generic software pipelining at the assembly level
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Integrated Modulo Scheduling for Clustered VLIW Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Modulo scheduling without overlapped lifetimes
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors
CC'07 Proceedings of the 16th international conference on Compiler construction
Compilation strategies for reducing code size on a VLIW processor with variable length instructions
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture
Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Hi-index | 0.00 |
Digital Signal Processing (DSP) architectures are specialized for high performance numerical algorithms such as those found in communication and multimedia applications. The development of efficient compilers for DSP processors is a growing research area. The Texas Instruments TMS320C6x (C6x) is a Very Long Instruction Word (VLIW) DSP architecture capable of issuing eight operations in parallel. In this paper, we present the results of implementing a software pipelining algorithm for the C6x. We provide a description of the C6x and detail the architectural features that impact software pipelining such as a moderately sized register file, constraints on code size, homogeneous resources, and multiple assignment code. We discuss how we adapted modulo scheduling to implement software pipelining for the C6x. Finally, we present the results of modulo scheduling a set of 40 loop kernel benchmarks and measure the algorithm in terms of schedule quality and algorithm complexity.