An APL Compiler for a Vector Processor
ACM Transactions on Programming Languages and Systems (TOPLAS)
A user-programmable vertex engine
Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Compilation and delayed evaluation in APL
POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The Paralation Model: Architecture-Independent Parallel Programming
The Paralation Model: Architecture-Independent Parallel Programming
Introductory Techniques for 3-D Computer Vision
Introductory Techniques for 3-D Computer Vision
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
NESL: A Nested Data-Parallel Language (Version 2.6)
NESL: A Nested Data-Parallel Language (Version 2.6)
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh
GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems)
IEEE Micro
ACM SIGGRAPH 2006 Papers
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Scout: a data-parallel programming language for graphics processors
Parallel Computing
Cache-efficient numerical algorithms using graphics hardware
Parallel Computing
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster
Parallel Computing
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
BSGP: bulk-synchronous GPU programming
ACM SIGGRAPH 2008 papers
Accelerating advanced mri reconstructions on gpus
Proceedings of the 5th conference on Computing frontiers
GPU acceleration of cutoff pair potentials for molecular modeling applications
Proceedings of the 5th conference on Computing frontiers
A compiler framework for optimization of affine loop nests for gpgpus
Proceedings of the 22nd annual international conference on Supercomputing
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Enabling semantic communications for virtual machines via iConnect
VTDC '07 Proceedings of the 2nd international workshop on Virtualization technology in distributed computing
Implicitly-threaded parallelism in Manticore
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Accelerating advanced MRI reconstructions on GPUs
Journal of Parallel and Distributed Computing
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Program optimization carving for GPU computing
Journal of Parallel and Distributed Computing
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Languages and Compilers for Parallel Computing
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
Optimizing the parallel computation of linear recurrences using compact matrix representations
Journal of Parallel and Distributed Computing
Accelerating total variation regularization for matrix-valued images on GPUs
Proceedings of the 6th ACM conference on Computing frontiers
Stream processing for fast and efficient rotated Haar-like features using rotated integral images
International Journal of Intelligent Systems Technologies and Applications
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Software Pipelined Execution of Stream Programs on GPUs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Automatic parallelization for graphics processing units
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
A Practical Approach of Curved Ray Prestack Kirchhoff Time Migration on GPGPU
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
Compiler support for general-purpose computation on GPUs
The Journal of Supercomputing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An analytical model to exploit memory task scheduling
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
SIMD: an additional pattern for PLPP (pattern language for parallel programming)
Proceedings of the 14th Conference on Pattern Languages of Programs
Streams: emerging from a shared memory model
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Proceedings of the 24th ACM International Conference on Supercomputing
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Genetic programming on GPUs for image processing
International Journal of High Performance Systems Architecture
Deployment of parallel linear genetic programming using GPUs on PC and video game console platforms
Genetic Programming and Evolvable Machines
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A GPGPU transparent virtualization component for high performance computing clouds
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Accelerating Haskell array codes with multicore GPUs
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Breaking the GPU programming barrier with the auto-parallelising SAC compiler
Proceedings of the sixth workshop on Declarative aspects of multicore programming
A domain-specific approach to heterogeneous parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Bothnia: a dual-personality extension to the Intel integrated graphics driver
ACM SIGOPS Operating Systems Review
Practical parallel and concurrent programming
Proceedings of the 42nd ACM technical symposium on Computer science education
Implicitly threaded parallelism in manticore
Journal of Functional Programming
Communications of the ACM
A programming model for GPU-based parallel computing with scalability and abstraction
Proceedings of the 25th Spring Conference on Computer Graphics
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Steno: automatic optimization of declarative queries
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
Proceedings of the 20th international symposium on High performance distributed computing
Queue - Interoperability
Computing prestack Kirchhoff time migration on general purpose GPU
Computers & Geosciences
Obsidian: a domain specific embedded language for parallel programming of graphics processors
IFL'08 Proceedings of the 20th international conference on Implementation and application of functional languages
The jabberwocky programming environment for structured social computing
Proceedings of the 24th annual ACM symposium on User interface software and technology
Firepile: run-time compilation for GPUs in scala
Proceedings of the 10th ACM international conference on Generative programming and component engineering
Accelerator compiler for the VENICE vector processor
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Paragon: collaborative speculative loop execution on GPU and CPU
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
A compiler and runtime for heterogeneous computing
Proceedings of the 49th Annual Design Automation Conference
Mainstream parallel array programming on cell
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Compiler and runtime support for enabling reduction computations on heterogeneous systems
Concurrency and Computation: Practice & Experience
Parakeet: a just-in-time parallel accelerator for python
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
A general-purpose virtualization service for HPC on cloud computing: an application to GPUs
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Parallel execution of Java loops on Graphics Processing Units
Science of Computer Programming
Proceedings of the ACM International Conference on Computing Frontiers
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Embrace, defend, extend: a methodology for embedding preexisting DSLs
Proceedings of the 1st annual workshop on Functional programming concepts in domain-specific languages
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead. Programmers use a conventional imperative programming language and a library that provides only high-level data-parallel operations. No aspects of GPUs are exposed to programmers. The library implementation compiles the data-parallel operations on the fly to optimized GPU pixel shader code and API calls.We describe the compilation techniques used to do this. We evaluate the effectiveness of using data parallelism to program GPUs by providing results for a set of compute-intensive benchmarks. We compare the performance of Accelerator versions of the benchmarks against hand-written pixel shaders. The speeds of the Accelerator versions are typically within 50% of the speeds of hand-written pixel shader code. Some benchmarks significantly outperform C versions on a CPU: they are up to 18 times faster than C code running on a CPU.