Timing optimization via nest-loop pipelining considering code size

Authors:
Qingfeng Zhuge;Chun Jason Xue;Meikang Qiu;Jingtong Hu;Edwin H. -M. Sha
Affiliations:
Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Electrical Engineering, University of New Orleans, New Orleans, LA 70148, USA;Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA;Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA
Venue:
Microprocessors & Microsystems
Year:
2008

Citing 12
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Static scheduling for synthesis of DSP algorithms on various models

Journal of VLSI Signal Processing Systems
Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests

IEEE Transactions on Parallel and Distributed Systems
Scheduling Data-Flow Graphs via Retiming and Unfolding

IEEE Transactions on Parallel and Distributed Systems
Optimal Software Pipelining of Nested Loops

Proceedings of the 8th International Symposium on Parallel Processing
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
An approach for integrating basic retiming and software pipelining

Proceedings of the 4th ACM international conference on Embedded software
Generic software pipelining at the assembly level

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems

Execution Time Optimization Using Delayed Multidimensional Retiming

DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Embedded systems have strict timing and code size requirements. Software pipelining is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. However, there is no effective techniques exist for solving the software pipelining problem on nested loops. The existing software pipelining techniques for single loops can only explore the parallelism of the innermost loop, so the final timing performance is inferior. While multi-dimensional (MD) retiming can explore the outer loop parallelism, it introduces large overheads in loop index generation and code size due to loop transformation. In this paper, we show how the computation time and code size of a pipelined nested loop is affected by execution sequence and retiming, assuming there is no loop unfolding. We present the theory of Software PIpelining for NEsted loops (SPINE) to reveal the relationship among the computation time of an iteration, the execution sequence, and the software pipelining degree of a nested loop using retiming concepts. Two algorithms of Software PIpelining for NEsted loops (SPINE) are proposed based on the fundamental understanding of the properties of software pipelining for nested loops: the SPINE-FULL algorithm generates fully parallelized loops with the minimal overheads. The SPINE-ROW-WISE algorithm achieves the maximal parallelism in an iteration with a fixed row-wise execution sequence. Therefore, the overheads due to loop transformation are minimal. Our technique can be directly applied to imperfect nested loops. The experimental results show that the average improvement on the execution time of the pipelined loop generated by SPINE is 71.7% compared with that generated by the standard software pipelining technique. The average code size is reduced by 69.5% compared with that generated by the MD retiming technique.