Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Floating Point to Fixed Point Conversion of C Code
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
A Library of Parameterized Floating-Point Modules and Their Use
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Synchronous Interlocked Pipelines
ASYNC '02 Proceedings of the 8th International Symposium on Asynchronus Circuits and Systems
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Unifying Bit-Width Optimisation for Fixed-Point and Floating-Point Designs
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
When FPGAs are better at floating-point than microprocessors
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
ARITH '11 Proceedings of the 2011 IEEE 20th Symposium on Computer Arithmetic
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Hi-index | 0.00 |
The key to enabling widespread use of FPGAs for algorithm acceleration is to allow programmers to create efficient designs without the time-consuming hardware design process. Programmers are used to developing scientific and mathematical algorithms in high-level languages (C/C++) using floating point data types. Although easy to implement, the dynamic range provided by floating point is not necessary in many applications; more efficient implementations can be realized using fixed point arithmetic. While this topic has been studied previously [Han et al. 2006; Olson et al. 1999; Gaffar et al. 2004; Aamodt and Chow 1999], the degree of full automation has always been lacking. We present a novel design flow for cases where FPGAs are used to offload computations from a microprocessor. Our LLVM-based algorithm inserts value profiling code into an unmodified C/C++ application to guide its automatic conversion to fixed point. This allows for fast and accurate design space exploration on a host microprocessor before any accelerators are mapped to the FPGA. Through experimental results, we demonstrate that fixed-point conversion can yield resource savings of up to 2x--3x reductions. Embedded RAM usage is minimized, and 13%--22% higher Fmax than the original floating-point implementation is observed. In a case study, we show that 17% reduction in logic and 24% reduction in register usage can be realized by using our algorithm in conjunction with a High-Level Synthesis (HLS) tool.