An effective and efficient code generation algorithm for uniform loops on non-orthogonal DSP architecture

  • Authors:
  • Yi-Hsuan Lee;Cheng Chen

  • Affiliations:
  • Department of Computer Science and Information Engineering, 1001 Ta Hsueh Road, Hsinchu, 30050, Taiwan, ROC;Department of Computer Science and Information Engineering, 1001 Ta Hsueh Road, Hsinchu, 30050, Taiwan, ROC

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

To meet ever-increasing demands for higher performance and lower power consumption, many high-end digital signal processors (DSPs) commonly employ non-orthogonal architecture. This architecture typically is characterized by irregular data paths, heterogeneous registers, and multiple memory banks. Moreover, sufficient compiler support is obviously important to harvest its benefits. However, usual compilation techniques do not adapt well to non-orthogonal architectures and the compiler design becomes much more difficult due to the complexity of these architectures. The entire code generation process for non-orthogonal architecture must include several phases. In this paper, we extend our previous study to propose a code generation algorithm Rotation Scheduling with Spill Codes Avoiding (RSSA), which is suitable for various DSPs with similar architectural features. As well as introducing detailed principles and algorithms of RSSA, we select several DSP applications and evaluate it under Motorola DSP56000 architectures. The evaluation results clearly demonstrate the effectiveness of RSSA, which can obtain scheduling results with minimum length and fewer spill codes compared to related work. In addition, in order to study the influence of different number of resources on the scheduling result, we also define a hypothetical machine model to represent a scalable non-orthogonal DSP architecture. After evaluating RSSA on various target architectures, we find that adding additional accumulators is the most efficient way to reduce spill codes. Meanwhile, for instruction-level parallelism exploration, numbers of data ALUs and accumulators have to be concurrently increased. Furthermore, based on our analysis, RSSA is not only effective but also quite efficient compared to related studies.