Embedded floating-point units in FPGAs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Architectures and APIs: assessing requirements for delivering FPGA performance to applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Architectural modifications to enhance the floating-point performance of FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Floating-point divider design for FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
State-of-the-art in heterogeneous computing
Scientific Programming
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition
Journal of Signal Processing Systems
International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
A fast poisson solver for hybrid reconfigurable system
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Hi-index | 0.00 |
Advances in FPGA technology have led to dramatic improvements in double precision floating-point performance. Modern FPGAs boast several GigaFLOPs of raw computing power. Unfortunately, this computing power is distributed across 30 floating-point units with over 10 cycles of latency each. The user must find two orders of magnitude more parallelism than is typically exploited in a single microprocessor; thus, it is not clear that the computational power of FPGAs can be exploited across a wide range of algorithms. This paper explores three implementation alternatives for the Fast Fourier Transform (FFT) on FPGAs. The algorithms are compared in terms of sustained performance and memory requirements for various FFT sizes and FPGA sizes. The results indicate that FPGAs are competitive with microprocessors in terms of performance and that the "correct" FFT implementation varies based on the size of the transform and the size of the FPGA.