Performance and toolchain of a combined GPU/FPGA desktop (abstract only)

Authors:
Bruno da Silva;An Braeken;Erik H. D'Hollander;Abdellah Touhafi;Jan G. Cornelis;Jan Lemeire
Affiliations:
Erasmus University College, Brussels, Belgium;Erasmus University College, Brussels, Belgium;Ghent University, Ghent, Belgium;Erasmus University College, Brussels, Belgium;Free University of Brussels, Brussels, Belgium;Free University of Brussels, Brussels, Belgium
Venue:
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Year:
2013

Citing 3
Cited 1

Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
A view of the parallel computing landscape

Communications of the ACM - A View of Parallel Computing
The "Chimera": an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform

International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing

Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools

International Journal of Reconfigurable Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Low-power, high-performance computing nowadays relies on accelerator cards to speed up the calculations. Combining the power of GPUs with the flexibility of FPGAs enlarges the scope of problems that can be accelerated [2, 3]. We describe the performance analysis of a desktop equipped with a GPU Tesla 2050 and an FPGA Virtex-6 LX240T. First, the balance between the I/O and the raw peak performance is depicted using the roofline model [4]. Next, the performance of a number of image processing algorithms is measured and the results are mapped onto the roofline graph. This allows to compare the GPU and the FPGA and also to optimize the algorithms for both accelerators. A programming toolchain is implemented, consisting of OpenCL for the GPU and several High-Level Synthesis compilers for the FPGA. Our results show that the HLS compilers outperform handwritten code and offer a performance comparable to the GPU. In addition the FPGA compilers reduce the development time by an order of magnitude, at the expense of an increased resource consumption. The roofline model also shows that both accelerators are equally limited by the input/output bandwidth to the host. A well-tuned accelerator-based codesign, identifying the parallelism, the computation and data patterns of different classes of algorithms, will enable to maximize the performance of the combined GPU/FPGA system [1].