A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices

Authors:
Alessandro Antonio Nacci;Vincenzo Rana;Francesco Bruschi;Donatella Sciuto;Ivan Beretta;David Atienza
Affiliations:
Politecnico di Milano, Milan, Italy;Politecnico di Milano, Milan, Italy;Politecnico di Milano, Milan, Italy;Politecnico di Milano, Milan, Italy;École Polytechnique Fédérale de Lausanne, Embedded Systems Laboratory (ESL), Lausanne, Switzerland;École Polytechnique Fédérale de Lausanne, Embedded Systems Laboratory (ESL), Lausanne, Switzerland
Venue:
Proceedings of the 50th Annual Design Automation Conference
Year:
2013

Citing 12
Cited 0

A Jacobi--Davidson Iteration Method for Linear Eigenvalue Problems

SIAM Review
Symbolic execution and program testing

Communications of the ACM
An Algorithm for Total Variation Minimization and Applications

Journal of Mathematical Imaging and Vision
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Proceedings of the 23rd international conference on Supercomputing
Efficient FPGA implementation of convolution

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
A duality based approach for realtime TV-L1 optical flow

Proceedings of the 29th DAGM conference on Pattern recognition
A duality based algorithm for TV-L¹-optical-flow image registration

MICCAI'07 Proceedings of the 10th international conference on Medical image computing and computer-assisted intervention
Automatic generation of fpga-specific pipelined accelerators

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A new method of illumination normalization for robust face recognition

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic generation of hardware implementations for a given algorithm is generally a difficult task, especially when data dependencies span across multiple iterations such as in iterative stencil loops (ISLs). In this paper, we introduce an automatic design flow to extract parallelism from an ISL algorithm and perform a design space exploration to identify its best FPGA hardware implementation, in terms of both area and throughput. Experimental results show that the proposed methodology generates hardware designs whose performance is comparable to the one of manually-optimized solutions, and orders of magnitude higher than the implementations generated by commercial high-level synthesis tools.