Timing optimization via nest-loop pipelining considering code size

  • Authors:
  • Qingfeng Zhuge;Chun Jason Xue;Meikang Qiu;Jingtong Hu;Edwin H. -M. Sha

  • Affiliations:
  • Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Electrical Engineering, University of New Orleans, New Orleans, LA 70148, USA;Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA;Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA

  • Venue:
  • Microprocessors & Microsystems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Embedded systems have strict timing and code size requirements. Software pipelining is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. However, there is no effective techniques exist for solving the software pipelining problem on nested loops. The existing software pipelining techniques for single loops can only explore the parallelism of the innermost loop, so the final timing performance is inferior. While multi-dimensional (MD) retiming can explore the outer loop parallelism, it introduces large overheads in loop index generation and code size due to loop transformation. In this paper, we show how the computation time and code size of a pipelined nested loop is affected by execution sequence and retiming, assuming there is no loop unfolding. We present the theory of Software PIpelining for NEsted loops (SPINE) to reveal the relationship among the computation time of an iteration, the execution sequence, and the software pipelining degree of a nested loop using retiming concepts. Two algorithms of Software PIpelining for NEsted loops (SPINE) are proposed based on the fundamental understanding of the properties of software pipelining for nested loops: the SPINE-FULL algorithm generates fully parallelized loops with the minimal overheads. The SPINE-ROW-WISE algorithm achieves the maximal parallelism in an iteration with a fixed row-wise execution sequence. Therefore, the overheads due to loop transformation are minimal. Our technique can be directly applied to imperfect nested loops. The experimental results show that the average improvement on the execution time of the pipelined loop generated by SPINE is 71.7% compared with that generated by the standard software pipelining technique. The average code size is reduced by 69.5% compared with that generated by the MD retiming technique.