OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices

Authors:
Mingjie Lin;Ilia Lebedev;John Wawrzynek
Affiliations:
-;-;-
Venue:
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Year:
2010

Citing 0
Cited 8

Bridging the GPGPU-FPGA efficiency gap

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Massively parallel programming models used as hardware description languages: the OpenCL case

Proceedings of the International Conference on Computer-Aided Design
OpenCL memory infrastructure for FPGAs (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
High-level synthesis: productivity, performance, and software constraints

Journal of Electrical and Computer Engineering - Special issue on ESL Design Methodology
Cyfield-RISP: generating dynamic instruction set processors for reconfigurable hardware using OpenCL

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
A methodology for efficient use of OpenCL, ESL and FPGAs in multi-core architectures

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Efficient compilation of CUDA kernels for high-performance computing on FPGAs

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work presents the Open Reconfigurable Computing Language (OpenRCL) system designed to enable low-power high-performance reconfigurable computing with imperative programming language such as C/C++. The key idea is to expose the FPGA platform as a compiler target for applications expressed in the OpenCL paradigm. To this end, we present a combination of low-level virtual machine instruction set, execution model, many-core architecture, and associated compiler to achieve high performance and power efficiency by exploiting the FPGA’s distributed memories and abundant hardware structures (such as DSP blocks, long carry-chains, and registers). Our resulting OpenRCL system not only allows programmers to easily express parallelism through the API defined in the OpenCL standard but also supports coarse-grain multithreading and dataflow-style fine-grain threading while permitting bit-level resource control. An OpenRCL prototype machine with 30 processing nodes was implemented using a Virtex-5 (XCV5LX155T-2) FPGA. For the well-known Parallel Prefix Sum (Scan) problem, comparing the runtime of the same problem on a GeForce 9400m using the OpenCL SDK from Apple Inc., the OpenRCL machine demonstrates comparable performance with a 5x reduction in core power consumption.