Efficient Pipelining of Nested Loops: Unroll-and-Squash

Authors:
Darin Petkov;Randolph E. Harr;Saman P. Amarasinghe
Affiliations:
-;-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 12
Cited 8

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Software pipelining

ACM Computing Surveys (CSUR)
Advanced compiler design and implementation

Advanced compiler design and implementation
CORDS: hardware-software co-synthesis of reconfigurable real-time distributed embedded systems

Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP applications

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Hardware-software co-design of embedded reconfigurable architectures

Proceedings of the 37th Annual Design Automation Conference
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Modern Compiler Implementation in C

Modern Compiler Implementation in C
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Automatic Synthesis of Parallel Programs Targeted to Dynamically Reconfigurable Logic Arrays

FPL '95 Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications

Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exploitation of parallelism to nested loops with dependence cycles

Journal of Systems Architecture: the EUROMICRO Journal
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design--Implementation of Finite Interval Constant Modulus Algorithm

Journal of VLSI Signal Processing Systems
Enhancing self-scheduling algorithms via synchronization and weighting

Journal of Parallel and Distributed Computing
Software Pipelining in Nested Loops with Prolog-Epilog Merging

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Improving performance of nested loops on reconfigurable array processors

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Multi-dimensional kernel generation for loop nest software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The size and complexity of current custom VLSI have forced the use of high-level programming languages to describe hardware, and compiler and synthesis technology to map abstract designs into silicon. Since streaming data processing in DSP applications is typically described by loop constructs in a high-level language, loops are the most critical portions of the hardware description and special techniques are developed to optimally synthesize them. In this paper, we introduce a new method for mapping and pipelining nested loops efficiently into hardware. It achieves fine-grain parallelism even on strong intra- and inter-iteration data-dependent inner loops and, by sharing resources economically, improves performance at the expense of a small amount of additional area. We implemented the transformation within the Nimble Compiler environment and evaluated its performance on several signal-processing benchmarks. The method achieves up to 2x improvement in the area efficiency compared to the best known optimization techniques.