Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
Advanced compiler design and implementation
Advanced compiler design and implementation
CORDS: hardware-software co-synthesis of reconfigurable real-time distributed embedded systems
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Hardware-software co-design of embedded reconfigurable architectures
Proceedings of the 37th Annual Design Automation Conference
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Modern Compiler Implementation in C
Modern Compiler Implementation in C
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Automatic Synthesis of Parallel Programs Targeted to Dynamically Reconfigurable Logic Arrays
FPL '95 Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exploitation of parallelism to nested loops with dependence cycles
Journal of Systems Architecture: the EUROMICRO Journal
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Journal of VLSI Signal Processing Systems
Enhancing self-scheduling algorithms via synchronization and weighting
Journal of Parallel and Distributed Computing
Software Pipelining in Nested Loops with Prolog-Epilog Merging
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Improving performance of nested loops on reconfigurable array processors
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Multi-dimensional kernel generation for loop nest software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
The size and complexity of current custom VLSI have forced the use of high-level programming languages to describe hardware, and compiler and synthesis technology to map abstract designs into silicon. Since streaming data processing in DSP applications is typically described by loop constructs in a high-level language, loops are the most critical portions of the hardware description and special techniques are developed to optimally synthesize them. In this paper, we introduce a new method for mapping and pipelining nested loops efficiently into hardware. It achieves fine-grain parallelism even on strong intra- and inter-iteration data-dependent inner loops and, by sharing resources economically, improves performance at the expense of a small amount of additional area. We implemented the transformation within the Nimble Compiler environment and evaluated its performance on several signal-processing benchmarks. The method achieves up to 2x improvement in the area efficiency compared to the best known optimization techniques.