Copperhead: compiling an embedded data parallel language

Authors:
Bryan Catanzaro;Michael Garland;Kurt Keutzer
Affiliations:
University of California, Berkeley, Berkeley, CA, USA;NVIDIA Corporation, Santa Clara, CA, USA;University of California, Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Year:
2011

Citing 17
Cited 23

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Vector models for data-parallel computing

Vector models for data-parallel computing
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Programming parallel algorithms

Communications of the ACM
Shader metaprogramming

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Accelerator: using data parallelism to program GPUs for general-purpose uses

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Data parallel Haskell: a status report

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Fast support vector machine training and classification on graphics processors

Proceedings of the 25th international conference on Machine learning
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
hiCUDA: a high-level directive-based language for GPU programming

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Compiling Python to a hybrid execution environment

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Dense point trajectories by GPU-accelerated large displacement optical flow

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
CUDA by Example: An Introduction to General-Purpose GPU Programming

CUDA by Example: An Introduction to General-Purpose GPU Programming

CudaDMA: optimizing GPU memory bandwidth via warp specialization

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation

Parallel Computing
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Data layout optimization for multi-valued containers in OpenCL

Journal of Parallel and Distributed Computing
Parakeet: a just-in-time parallel accelerator for python

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Nested data-parallelism on the gpu

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Riposte: a trace-driven compiler and parallel VM for vector code in R

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Adaptive data parallelism for internet clients on heterogeneous platforms

Proceedings of the 8th symposium on Dynamic languages
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Terra: a multi-stage language for high-performance computing

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
LINQits: big data on little clients

Proceedings of the 40th Annual International Symposium on Computer Architecture
Embrace, defend, extend: a methodology for embedding preexisting DSLs

Proceedings of the 1st annual workshop on Functional programming concepts in domain-specific languages
River trail: a path to parallelism in JavaScript

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Forge: generating a high performance DSL implementation from a declarative specification

Proceedings of the 12th international conference on Generative programming: concepts & experiences
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Composition and reuse with compiled domain-specific languages

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library

Journal of Parallel and Distributed Computing
Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Red Fox: An Execution Environment for Relational Query Processing on GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
ParallelJS: An Execution Framework for JavaScript on Heterogeneous Systems

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. The Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.