Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Sentinel scheduling for VLIW and superscalar processors
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for the TMS320C6x VLIW DSP architecture
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Three Architectural Models for Compiler-Controlled Speculative Execution
IEEE Transactions on Computers
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Hi-index | 0.00 |
The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) loops. More recently, software pipelining has been extended to irregular (WHILE) loops, but only on architectures that provide special-purpose hardware such as rotating (predicate and general-purpose) register files, specific instructions for filling/draining software pipelined loops, and possibly hardware support for speculative code motion. In contrast, the TMS320C6000 family has a limited, static register file and no specialized hardware beyond the ability to predicate instructions using a few static registers. In this paper, we describe our experience extending a production compiler for the TMS320C6000 family to software pipeline irregular loops. We discuss our technique for preprocessing irregular loops so that they can be handled by the existing software pipeliner. Our approach is much simpler than previous approaches and works very well in the presence of the DSP applications and the target architecture which characterize our environment. With this optimization, we achieve impressive speedups on several key DSP and non-DSP algorithms.