Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
Instruction selection for embedded DSPs with complex instructions
EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for the TMS320C6x VLIW DSP architecture
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Random Trees and the Analysis of Branch and Bound Procedures
Journal of the ACM (JACM)
An efficient search algorithm to find the elementary circuits of a graph
Communications of the ACM
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Enhancing loop buffering of media and telecommunications applications using low-overhead predication
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An Integer Linear Programming Model of Software Pipelining for the MIPS R8000 Processor
PaCT '97 Proceedings of the 4th International Conference on Parallel Computing Technologies
A New Approach to DSP Intrinsic Functions
HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 8 - Volume 8
DSP architectures: past, present and futures
ACM SIGARCH Computer Architecture News
Compiler transformations for effectively exploiting a zero overhead loop buffer
Software—Practice & Experience
Algorithms in C
Hi-index | 0.00 |
To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often prevent the modulo scheduler from achieving ideal loop initiation intervals. As a result, replicated functional units in multi-issue DSPs are frequently underutilized. In response to this resource underutilization problem, this paper describes a compiler preprocessing strategy that capitalizes on two techniques for effective modulo scheduling, referred to as cloning1 and cloning2. The core of the proposed techniques lies in the direct relaxation of cyclic data dependences by exploiting functional units which are otherwise left unused. Since our preprocessing strategy requires neither code duplication nor additional hardware support, it is relatively easy to implement in DSP compilers. The strategy proposed has been validated by an implementation for a StarCore SC140 optimizing C compiler.