Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

Authors:
Ben Cope;Peter Y. K. Cheung;Wayne Luk;Lee Howes
Affiliations:
Imperial College London , London;Imperial College London, London;Imperial College London, London;Imperial College London
Venue:
IEEE Transactions on Computers
Year:
2010

Citing 0
Cited 7

FPGA-based architecture for the real-time computation of 2-D convolution with large kernel size

Journal of Systems Architecture: the EUROMICRO Journal
Hardware description and synthesis of control-intensive reconfigurable dataflow architectures (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Parallel neighbourhood search on many-core platforms

International Journal of Computational Science and Engineering
Optimization of address-based data sorting unit with external memory support

Proceedings of the 14th International Conference on Computer Systems and Technologies
Real-time video surveillance on an embedded, programmable platform

Microprocessors & Microsystems
A practical evaluation of the performance of the Impulse CoDeveloper HLS tool for implementing large-kernel 2-D filters

Journal of Real-Time Image Processing
Modified stable Euler-number algorithm implementation for real-time image binarization

Journal of Real-Time Image Processing

Quantified Score

Hi-index	14.98

Visualization

Abstract

A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx Virtex-4 field programmable gate array (FPGA). Two orders of magnitude speedup, over a general-purpose processor, is observed for each device for arithmetic intensive algorithms. An FPGA is superior, over a GPU, for algorithms requiring large numbers of regular memory accesses, while the GPU is superior for algorithms with variable data reuse. In the presence of data dependence, the implementation of a customized data path in an FPGA exceeds GPU performance by up to eight times. The trends of the analysis to newer and future technologies are analyzed.