Automatic generation of fpga-specific pipelined accelerators

Authors:
Christophe Alias;Bogdan Pasca;Alexandru Plesco
Affiliations:
LIP (ENSL-CNRS-Inria-UCBL), École Normale Supérieure de Lyon, Lyon Cedex 07, France;LIP (ENSL-CNRS-Inria-UCBL), École Normale Supérieure de Lyon, Lyon Cedex 07, France;LIP (ENSL-CNRS-Inria-UCBL), École Normale Supérieure de Lyon, Lyon Cedex 07, France
Venue:
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Year:
2011

Citing 10
Cited 2

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Loop tiling for parallelism

Loop tiling for parallelism
Scanning Polyhedra without Do-loops

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
64-bit floating-point FPGA matrix multiplication

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Bee+Cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Multiplicative Square Root Algorithms for FPGAs

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications

Automatic parallelisation for LTI MIMO state space systems using FPGAs. An optimisation for cost & performance

Journal of Parallel and Distributed Computing
A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent increase in the complexity of the circuits has brought high-level synthesis tools as a must in the digital circuit design. However, these tools come with several limitations, and one of them is the efficient use of pipelined arithmetic operators. This paper explains how to generate efficient hardware with floating-point pipelined operators for regular codes with perfect loop nests. The part to be mapped to the operator is identified, then the program is scheduled so that each intermediate result is produced exactly at the time it is needed by the operator, avoiding pipeline stalling and temporary buffers. Finally, we show how to generate the VHDL code for the control unit and how to link it with specialized pipelined floating-point operators generated using the open-source FloPoCo tool. The method has been implemented in the Bee research compiler and experimental results on DSP kernels show promising results with a minimum of 94% efficient utilization of the pipelined operators for a complex kernel.