VLSI array processors
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
Speed and area tradeoffs in cluster-based FPGA architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DG2VHDL: A Tool to Facilitate the High Level Synthesisof Parallel Processing Array Architectures
Journal of VLSI Signal Processing Systems - Special issue on recent advances in the design and implementation of signal processing systems
The parallel execution of DO loops
Communications of the ACM
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Application of Reconfigurable Computing to a High Performance Front-End Radar Signal Processor
Journal of VLSI Signal Processing Systems
A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable Computing Systems
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Configurable logic for digital communications: some signal processing perspectives
IEEE Communications Magazine
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Frame-level pipelined motion estimation array processor
IEEE Transactions on Circuits and Systems for Video Technology
A novel modular systolic array architecture for full-search block matching motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Recently, FPGAs (field programmable gate arrays) technology have made significant advances in both speed and capacity. Millions of logic gates are now available for reconfiguration programming. To fully exploit the potential of so many programmable devices, powerful design methodology must be developed. In this paper, we propose a novel systematic computer-aided design methodology that can efficiently implement deeply nested do-loop algorithms on a FPGA. Specifically, our design methodology maps the loop dependence graph onto a linear array of locally connected processing elements to exploit parallelism. Due to the regular structure of this linear array of processors, it can be easily implemented on a FPGA. While this method is based on conventional systolic array design methodology, our proposed approach exhibits two distinct features that contribute to its superior performance: 1) We developed a novel multiple-order dependence graph representation that is able to efficiently represent distinct, yet correct algorithm execution orders. 2) We developed new FPGA-specific architectural constraints during the mapping process. As such, FPGA implementations based on our approach will utilize much fewer lookup tables while achieving superior performance.