CnC-CUDA: declarative programming for GPUs

Authors:
Max Grossman;Alina Simion Sbîrlea;Zoran Budimlić;Vivek Sarkar
Affiliations:
Department of Computer Science, Rice University;Department of Computer Science, Rice University;Department of Computer Science, Rice University;Department of Computer Science, Rice University
Venue:
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Year:
2010

Citing 13
Cited 3

Generative communication in Linda

ACM Transactions on Programming Languages and Systems (TOPLAS)
Java Concurrency in Practice

Java Concurrency in Practice
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Intel threading building blocks

Intel threading building blocks
Declarative aspects of memory management in the concurrent collections parallel programming model

Proceedings of the 4th workshop on Declarative aspects of multicore programming
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A type and effect system for deterministic parallel Java

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture

Habanero-Java: the new adventures of old X10

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. Instead, future computer systems are expected to be built using homogeneous and heterogeneous many-core processors with 10's to 100's of cores per chip, and complex hardware designs to address the challenges of concurrency, energy efficiency and resiliency. Unlike previous generations of hardware evolution, this shift towards many-core computing will have a profound impact on software. These software challenges are further compounded by the need to enable parallelism in workloads and application domains that traditionally did not have to worry about multiprocessor parallelism in the past. A recent trend in mainstream desktop systems is the use of graphics processor units (GPUs) to obtain order-of-magnitude performance improvements relative to general-purpose CPUs. Unfortunately, hybrid programming models that support multithreaded execution on CPUs in parallel with CUDA execution on GPUs prove to be too complex for use by mainstream programmers and domain experts, especially when targeting platforms with multiple CPU cores and multiple GPU devices. In this paper, we extend past work on Intel's Concurrent Collections (CnC) programming model to address the hybrid programming challenge using a model called CnC-CUDA. CnC is a declarative and implicitly parallel coordination language that supports flexible combinations of task and data parallelism while retaining determinism. CnC computations are built using steps that are related by data and control dependence edges, which are represented by a CnC graph. The CnC-CUDA extensions in this paper include the definition of multithreaded steps for execution on GPUs, and automatic generation of data and control flow between CPU steps and GPU steps. Experimental results show that this approach can yield significant performance benefits with both GPU execution and hybrid CPU/GPU execution.