Symbolic execution and program testing
Communications of the ACM
An Algorithm for Total Variation Minimization and Applications
Journal of Mathematical Imaging and Vision
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Proceedings of the 23rd international conference on Supercomputing
Efficient FPGA implementation of convolution
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
A duality based approach for realtime TV-L1 optical flow
Proceedings of the 29th DAGM conference on Pattern recognition
A duality based algorithm for TV-L¹-optical-flow image registration
MICCAI'07 Proceedings of the 10th international conference on Medical image computing and computer-assisted intervention
Automatic generation of fpga-specific pipelined accelerators
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A new method of illumination normalization for robust face recognition
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Hi-index | 0.00 |
The automatic generation of hardware implementations for a given algorithm is generally a difficult task, especially when data dependencies span across multiple iterations such as in iterative stencil loops (ISLs). In this paper, we introduce an automatic design flow to extract parallelism from an ISL algorithm and perform a design space exploration to identify its best FPGA hardware implementation, in terms of both area and throughput. Experimental results show that the proposed methodology generates hardware designs whose performance is comparable to the one of manually-optimized solutions, and orders of magnitude higher than the implementations generated by commercial high-level synthesis tools.