OpenMP extensions for FPGA accelerators

Authors:
Daniel Cabrera;Xavier Martorell;Georgi Gaydadjiev;Eduard Ayguade;Daniel Jiménez-González
Affiliations:
Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politecnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politecnica de Catalunya, Barcelona, Spain;Delft University of Technology, Delft, The Netherlands;Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politecnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politecnica de Catalunya, Barcelona, Spain
Venue:
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Year:
2009

Citing 8
Cited 3

Stream-Oriented FPGA Computing in the Streams-C High Level Language

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
64-bit floating-point FPGA matrix multiplication

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Compiling code accelerators for FPGAs

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
A system for transforming an ANSI C code with OpenMP directives into a SystemC description

DDECS '06 Proceedings of the 2006 IEEE Design and Diagnostics of Electronic Circuits and systems
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
OpenFPGA CoreLib core library interoperability effort

Parallel Computing
Evaluation of memory performance on the cell BE with the SARC programming model

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism

Fuzzy application parallelization using OpenMP

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Efficient and scalable OpenMP-based system-level design

Proceedings of the Conference on Design, Automation and Test in Europe
Design space exploration for high-level synthesis of multi-threaded applications

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reconfigurable computing is one of the paths to explore towards low-power supercomputing. However, programming these reconfigurable devices is not an easy task and still requires significant research and development efforts to make it really productive. In addition, the use of these devices as accelerators in multicore, SMPs and ccNUMA architectures adds an additional level of programming complexity in order to specify the offloading of tasks to reconfigurable devices and the interoperability with current shared-memory programming paradigms such as OpenMP. This paper presents extensions to OpenMP 3.0 that try to address this second challenge and an implementation in a prototype runtime system. With these extensions the programmer can easily express the offloading of an already existing reconfigurable binary code (bitstream) hiding all the complexities related with device configuration, bitstream loading, data arrangement and movement to the device memory. Our current prototype implementation targets the SGI Altix systems with RASC blades (based on the Virtex 4 FPGA). We analyze the overheads introduced in this implementation and propose a hybrid host/device operational mode to hide some of these overheads, significantly improving the performance of the applications. A complete evaluation of the system is done with a matrix multiplication kernel, including an estimation considering different FPGA frequencies.