POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
ICS '96 Proceedings of the 10th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compaan: deriving process networks from Matlab for embedded signal processing architectures
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Loop tiling for parallelism
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Scanning Polyhedra without Do-loops
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Translating affine nested-loop programs to process networks
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
High Performance Linear Algebra Operations on Reconfigurable Systems
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Floating-Point Accumulation Circuit for Matrix Applications
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Integer and floating-point constant multipliers for FPGAs
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
FPGA Floating Point Datapath Compiler
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Proceedings of the Conference on Design, Automation and Test in Europe
Multiplicative Square Root Algorithms for FPGAs
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Compilation Techniques for Reconfigurable Architectures
Compilation Techniques for Reconfigurable Architectures
Designing Custom Arithmetic Data Paths with FloPoCo
IEEE Design & Test
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
The increased capacity and enhanced features of modern FPGAs opens new opportunities for their use as application accelerators. However, for FPGAs to be accepted as mainstream acceleration solutions, long design cycles must be shortened by using high-level synthesis tools in the design process. Current HLS tools targeting FPGAs have several limitations including the inefficient use of deeply pipelined arithmetic operators, commonly encountered in high-throughput FPGA designs. We focus here on the efficient generation of FPGA-specific hardware accelerators for regular codes with perfect loop nests where inner statements are implemented as a pipelined arithmetic operator, which is often the case of scientific codes using floating-point arithmetic. We propose a semi-automatic code generation process where the arithmetic operator is identified and generated. Its pipeline information is used to reschedule the initial program execution in order to keep the operator's pipeline as ''busy'' as possible, while minimizing memory access. Next, we show how our method can be used as a tool to generate control FSMs for multiple parallel computing cores. Finally, we show that accounting for the application's accuracy needs allows designing smaller and faster operators.