Communications of the ACM - Special issue on parallelism
Vector models for data-parallel computing
Vector models for data-parallel computing
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Programming parallel algorithms
Communications of the ACM
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Improvements to Platt's SMO Algorithm for SVM Classifier Design
Neural Computation
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Fast support vector machine training and classification on graphics processors
Proceedings of the 25th international conference on Machine learning
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Compiling Python to a hybrid execution environment
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Implementing the PGI Accelerator model
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Dense point trajectories by GPU-accelerated large displacement optical flow
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
CUDA by Example: An Introduction to General-Purpose GPU Programming
CUDA by Example: An Introduction to General-Purpose GPU Programming
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Data layout optimization for multi-valued containers in OpenCL
Journal of Parallel and Distributed Computing
Parakeet: a just-in-time parallel accelerator for python
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Riposte: a trace-driven compiler and parallel VM for vector code in R
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Adaptive data parallelism for internet clients on heterogeneous platforms
Proceedings of the 8th symposium on Dynamic languages
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Terra: a multi-stage language for high-performance computing
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Embrace, defend, extend: a methodology for embedding preexisting DSLs
Proceedings of the 1st annual workshop on Functional programming concepts in domain-specific languages
River trail: a path to parallelism in JavaScript
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Forge: generating a high performance DSL implementation from a declarative specification
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Composition and reuse with compiled domain-specific languages
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library
Journal of Parallel and Distributed Computing
Efficient Mapping of Irregular C++ Applications to Integrated GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
ParallelJS: An Execution Framework for JavaScript on Heterogeneous Systems
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. The Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.