Enabling compiler flow for embedded VLIW DSP processors with distributed register files

Authors:
Chung-Kai Chen;Ling-Hua Tseng;Shih-Chang Chen;Young-Jia Lin;Yi-Ping You;Chia-Han Lu;Jenq-Kuen Lee
Affiliations:
National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan
Venue:
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2007

Citing 6
Cited 1

Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Instruction Scheduling for Clustered VLIW DSPs

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Global Register Partitioning

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Copy propagation optimizations for VLIW DSP processors with distributed register files

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Compiler supports and optimizations for PAC VLIW DSP processors

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

Parallel Architecture Core (PAC)--the First Multicore Application Processor SoC in Taiwan Part I: Hardware Architecture & Software Development Tools

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register files. This presents new challenges for devising compiler optimization schemes for such architectures. In this paper, we address the compiler optimization issues for PAC architecture, which is a 5-way issue DSP processor with distributed register files. We present an integrated flow to address several phases of compiler optimizations in interacting with distributed register files and multi-bank register files in the layer of instruction scheduling, software pipelining, and data flow optimizations. Our experiments on a novel 32-bit embedded VLIW DSP (known as the PAC DSP core) exhibit the state of the art performance for embedded VLIW DSP processors with distributed register files by incorporating our proposed schemes in compilers.