Code Optimization Techniques of Data-Intensive Tasks onto Statically Scheduled Architectures: Optimal Performance on the TigerSharc

  • Authors:
  • Norbert A. Pilz;Kenneth Adamson

  • Affiliations:
  • -;-

  • Venue:
  • PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper considers code optimization using the novel TS1xx processor from Analog Devices. Very large instruction word architectures (VLIW), such as the TS1xx represent the state of the art in high-performance signal processing. The theoretically achievable peak performance of VLIW processors increases steadily with the use of on-chip parallelism. It is demonstrated that C compiler technology cannot achieve peak computing rates on a statically scheduled processor and the applications programmer must rely on hand optimized Assembler Libraries. This necessitates intimate knowledge of the specific compiler optimization techniques, as well as the underlying hardware. Compiler friendly code optimized by the VisualC2.0 compiler, is compared against hand optimized Assembler code for a common operation involving a loop with multiple memory accesses, floating point arithmetic and pointer operations. It is found that mature C code for matrix vector multiplication executes in roughly 1.18*n*m cycles, whereas the same operation optimized in assembler has a cycle complexity of 0.5*n(m+16) - a measurable performance improvement.