POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
ICS '96 Proceedings of the 10th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Loop tiling for parallelism
Scanning Polyhedra without Do-loops
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Multiplicative Square Root Algorithms for FPGAs
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Journal of Parallel and Distributed Computing
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
Recent increase in the complexity of the circuits has brought high-level synthesis tools as a must in the digital circuit design. However, these tools come with several limitations, and one of them is the efficient use of pipelined arithmetic operators. This paper explains how to generate efficient hardware with floating-point pipelined operators for regular codes with perfect loop nests. The part to be mapped to the operator is identified, then the program is scheduled so that each intermediate result is produced exactly at the time it is needed by the operator, avoiding pipeline stalling and temporary buffers. Finally, we show how to generate the VHDL code for the control unit and how to link it with specialized pipelined floating-point operators generated using the open-source FloPoCo tool. The method has been implemented in the Bee research compiler and experimental results on DSP kernels show promising results with a minimum of 94% efficient utilization of the pipelined operators for a complex kernel.