FPGA-specific synthesis of loop-nests with pipelined computational cores

Authors:
Christophe Alias;Bogdan Pasca;Alexandru Plesco
Affiliations:
LIP (ENSL-CNRS-Inria-UCBL), ícole Normale Supérieure de Lyon, 46 allée d'Italie, 69364 Lyon Cedex 07, France;LIP (ENSL-CNRS-Inria-UCBL), ícole Normale Supérieure de Lyon, 46 allée d'Italie, 69364 Lyon Cedex 07, France;LIP (ENSL-CNRS-Inria-UCBL), ícole Normale Supérieure de Lyon, 46 allée d'Italie, 69364 Lyon Cedex 07, France
Venue:
Microprocessors & Microsystems
Year:
2012

Citing 21
Cited 1

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compaan: deriving process networks from Matlab for embedded signal processing architectures

CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Loop tiling for parallelism

Loop tiling for parallelism
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
PICO: Automatically Designing Custom Computers

Computer
Scanning Polyhedra without Do-loops

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Translating affine nested-loop programs to process networks

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
64-bit floating-point FPGA matrix multiplication

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Floating-Point Accumulation Circuit for Matrix Applications

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Bee+Cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Integer and floating-point constant multipliers for FPGAs

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
FPGA Floating Point Datapath Compiler

FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Model based design needs high level synthesis: a collection of high level synthesis techniques to improve productivity and quality of results for model based electronic design

Proceedings of the Conference on Design, Automation and Test in Europe
Multiplicative Square Root Algorithms for FPGAs

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Compilation Techniques for Reconfigurable Architectures

Compilation Techniques for Reconfigurable Architectures
Designing Custom Arithmetic Data Paths with FloPoCo

IEEE Design & Test

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increased capacity and enhanced features of modern FPGAs opens new opportunities for their use as application accelerators. However, for FPGAs to be accepted as mainstream acceleration solutions, long design cycles must be shortened by using high-level synthesis tools in the design process. Current HLS tools targeting FPGAs have several limitations including the inefficient use of deeply pipelined arithmetic operators, commonly encountered in high-throughput FPGA designs. We focus here on the efficient generation of FPGA-specific hardware accelerators for regular codes with perfect loop nests where inner statements are implemented as a pipelined arithmetic operator, which is often the case of scientific codes using floating-point arithmetic. We propose a semi-automatic code generation process where the arithmetic operator is identified and generated. Its pipeline information is used to reschedule the initial program execution in order to keep the operator's pipeline as ''busy'' as possible, while minimizing memory access. Next, we show how our method can be used as a tool to generate control FSMs for multiple parallel computing cores. Finally, we show that accounting for the application's accuracy needs allows designing smaller and faster operators.