High-quality operation binding for clustered VLIW datapaths
Proceedings of the 38th annual Design Automation Conference
Cluster assignment for high-performance embedded VLIW processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Scheduling expression trees for delayed-load architectures
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler optimization-space exploration
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors
Proceedings of the 1st conference on Computing frontiers
VHC: Quickly Building an Optimizer for Complex Embedded Architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Instruction buffering exploration for low energy VLIWs with instruction clusters
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Automatic data partitioning for the agere payload plus network processor
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-assisted leakage energy optimization for clustered VLIW architectures
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Enabling compiler flow for embedded VLIW DSP processors with distributed register files
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Interactive presentation: Time-constrained clustering for DSE of clustered VLIW-ASP
Proceedings of the conference on Design, automation and test in Europe
Efficient implementation of nested-loop multimedia algorithms
EURASIP Journal on Applied Signal Processing
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Optimal vs. heuristic integrated code generation for clustered VLIW architectures
SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Journal of Signal Processing Systems
A Novel instruction stream buffer for VLIW architectures
Computers and Electrical Engineering
Copy propagation optimizations for VLIW DSP processors with distributed register files
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
An efficient heuristic for instruction scheduling on clustered vliw processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Compiler supports and optimizations for PAC VLIW DSP processors
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
Feedback-Based global instruction scheduling for GPGPU applications
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Recent digital signal processors (DSPs) show a homogeneous VLIW-like data path architecture, which allows C compilers to generate efficient code. However, still some special restrictions have to be obeyed in code generation for VLIW DSPs. In order to reduce the number of register file ports needed to provide data for multiple functional units working in parallel, the DSP data path may be clustered into several sub-paths, with very limited capabilities of exchanging values between the different clusters. An example is the well-known Texas Instruments C6201 DSP. For such architecture, the tasks of scheduling and partitioning instructions between the clusters are highly interdependent. This paper presents a new instruction scheduling approach, which in contrast to earlier work, integrates partitioning and scheduling into a single technique, to achieve a high code quality. We show experimentally that the proposed technique is capable of generating more efficient code than a commercial code generator for the TI C6201.