Optimizing Loop Performance for Clustered VLIW Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Inter-Cluster Communication Models for Clustered VLIW Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Instruction Scheduling for Clustered VLIW DSPs
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Copy propagation optimizations for VLIW DSP processors with distributed register files
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Compiler supports and optimizations for PAC VLIW DSP processors
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Journal of Signal Processing Systems
Hi-index | 0.00 |
High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register files. This presents new challenges for devising compiler optimization schemes for such architectures. In this paper, we address the compiler optimization issues for PAC architecture, which is a 5-way issue DSP processor with distributed register files. We present an integrated flow to address several phases of compiler optimizations in interacting with distributed register files and multi-bank register files in the layer of instruction scheduling, software pipelining, and data flow optimizations. Our experiments on a novel 32-bit embedded VLIW DSP (known as the PAC DSP core) exhibit the state of the art performance for embedded VLIW DSP processors with distributed register files by incorporating our proposed schemes in compilers.