A compiler approach to fast hardware design space exploration in FPGA-based systems
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Automatic Synthesis of Data Storage and Control Structures for FPGA-Based Computing Engines
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Automatic Sliding Window Operation Optimization for FPGA-Based
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FPGA-based configurable systolic architecture for window-based image processing
EURASIP Journal on Applied Signal Processing
A Parameterized Architecture Model in High Level Synthesis for Image Processing Applications
ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
Efficient hardware code generation for FPGAs
ACM Transactions on Architecture and Code Optimization (TACO)
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Extending the applicability of scalar replacement to multiple induction variables
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Window operations which are computationally intensive and data intensive are frequently used in image compression, pattern recognition and digital signal processing. Reconfigurable hardware boards provide a convenient and flexible solution to speed up these algorithms. In this paper, we design a three-level memory structure to realize inner-loop and outer-loop data reuse in window operations completely, and use shifted registers to make hardware design simpler. Then, we present a design space exploration algorithm to get a high-performance design without going through the time-consuming hardware design process for each different algorithm. By finding the three upper bounds according to area constraints, memory bandwidth constraints and on-chip memory constraints, the block structure of the design which can fully utilize the available resources on the board is determined.