Advanced Computer Architectures
Advanced Computer Architectures
Source-level loop optimization for DSP code generation
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
Hi-index | 0.00 |
This paper considers code optimization using the novel TS1xx processor from Analog Devices. Very large instruction word architectures (VLIW), such as the TS1xx represent the state of the art in high-performance signal processing. The theoretically achievable peak performance of VLIW processors increases steadily with the use of on-chip parallelism. It is demonstrated that C compiler technology cannot achieve peak computing rates on a statically scheduled processor and the applications programmer must rely on hand optimized Assembler Libraries. This necessitates intimate knowledge of the specific compiler optimization techniques, as well as the underlying hardware. Compiler friendly code optimized by the VisualC2.0 compiler, is compared against hand optimized Assembler code for a common operation involving a loop with multiple memory accesses, floating point arithmetic and pointer operations. It is found that mature C code for matrix vector multiplication executes in roughly 1.18*n*m cycles, whereas the same operation optimized in assembler has a cycle complexity of 0.5*n(m+16) - a measurable performance improvement.