Bridging the GPGPU-FPGA efficiency gap
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Massively parallel programming models used as hardware description languages: the OpenCL case
Proceedings of the International Conference on Computer-Aided Design
OpenCL memory infrastructure for FPGAs (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
High-level synthesis: productivity, performance, and software constraints
Journal of Electrical and Computer Engineering - Special issue on ESL Design Methodology
Cyfield-RISP: generating dynamic instruction set processors for reconfigurable hardware using OpenCL
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
A methodology for efficient use of OpenCL, ESL and FPGAs in multi-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Efficient compilation of CUDA kernels for high-performance computing on FPGAs
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.01 |
This work presents the Open Reconfigurable Computing Language (OpenRCL) system designed to enable low-power high-performance reconfigurable computing with imperative programming language such as C/C++. The key idea is to expose the FPGA platform as a compiler target for applications expressed in the OpenCL paradigm. To this end, we present a combination of low-level virtual machine instruction set, execution model, many-core architecture, and associated compiler to achieve high performance and power efficiency by exploiting the FPGA’s distributed memories and abundant hardware structures (such as DSP blocks, long carry-chains, and registers). Our resulting OpenRCL system not only allows programmers to easily express parallelism through the API defined in the OpenCL standard but also supports coarse-grain multithreading and dataflow-style fine-grain threading while permitting bit-level resource control. An OpenRCL prototype machine with 30 processing nodes was implemented using a Virtex-5 (XCV5LX155T-2) FPGA. For the well-known Parallel Prefix Sum (Scan) problem, comparing the runtime of the same problem on a GeForce 9400m using the OpenCL SDK from Apple Inc., the OpenRCL machine demonstrates comparable performance with a 5x reduction in core power consumption.