JCudaMP: OpenMP/Java on CUDA

Authors:
Georg Dotzler;Ronald Veldema;Michael Klemm
Affiliations:
University of Erlangen-Nuremberg, Martensstr, Erlangen, Germany;University of Erlangen-Nuremberg, Martensstr, Erlangen, Germany;University of Erlangen-Nuremberg, Martensstr, Erlangen, Germany
Venue:
Proceedings of the 3rd International Workshop on Multicore Software Engineering
Year:
2010

Citing 13
Cited 8

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Java For Numerically Intensive Computing: From Flops To Gigaflops

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh

Metaprogramming GPUs with Sh
JaMP: an implementation of OpenMP for a Java DSM: Research Articles

Concurrency and Computation: Practice & Experience - Current Trends in Compilers for Parallel Computers (CPC2006)
Programming with tiles

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Programming the Cell Processor: For Games, Graphics, and Computation

Programming the Cell Processor: For Games, Graphics, and Computation
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Larrabee: A Many-Core x86 Architecture for Visual Computing

IEEE Micro
CuPP - A framework for easy CUDA integration

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Automatic scoping of variables in parallel regions of an OpenMP program

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP

New Horizons in Multicore Software Engineering

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Enabling multiple accelerator acceleration for Java/OpenMP

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
OpenMP-style parallelism in data-centered multicore computing with R

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Optimization strategies in different CUDA architectures using llCoMP

Microprocessors & Microsystems
Development of Java multi-threaded simulation for chemical reacting flow of ethanol

Advances in Engineering Software
Java in the High Performance Computing arena: Research, practice and experience

Science of Computer Programming
A Framework for Multiplatform HPC Applications

Proceedings of Programming Models and Applications on Multicores and Manycores
A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an OpenMP framework for Java that can exploit an available graphics card as an application accelerator. Dynamic languages (Java, C#, etc.) pose a challenge here because of their write-once-run-everywhere approach. This renders it impossible to make compile-time assumptions on whether and which type of accelerator or graphics card might be available in the system at run-time. We present an execution model that dynamically analyzes the running environment to find out what hardware is attached. Based on the results it dynamically rewrites the bytecode and generates the necessary gpGPU code on-the-fly. Furthermore, we solve two extra problems caused by the combination of Java and CUDA. First, CUDA-capable hardware usually has little memory (compared to main memory). However, as Java is a pointer-free language, array data can be stored in main memory and buffered in GPU memory. Second, CUDA requires one to copy data to and from the graphics card's memory explicitly. As modern languages use many small objects, this would involve many copy operations when done naively. This is exacerbated because Java uses arrays-of-arrays to implement multi-dimensional arrays. A clever copying technique and two new array packages allow for more efficient use of CUDA.