Compiling collection-oriented languages onto massively parallel computers
Journal of Parallel and Distributed Computing - Massively parallel computation
Compiling nested data-parallel programs for shared-memory multiprocessors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Optimizing an ANSI C interpreter with superoperators
POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Programming parallel algorithms
Communications of the ACM
The quickhull algorithm for convex hulls
ACM Transactions on Mathematical Software (TOMS)
Nepal - Nested Data Parallelism in Haskell
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Work-efficient nested data-parallelism
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
SAC: a functional array language for efficient multi-threaded execution
International Journal of Parallel Programming
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Stack-based parallel recursion on graphics processors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
OptiX: a general purpose ray tracing engine
ACM SIGGRAPH 2010 papers
Nikola: embedding compiled GPU functions in Haskell
Proceedings of the third ACM Haskell symposium on Haskell
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Accelerating Haskell array codes with multicore GPUs
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Breaking the GPU programming barrier with the auto-parallelising SAC compiler
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Simple optimizations for an applicative array language for graphics processors
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A new method for GPU based irregular reductions and its application to k-means clustering
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
A GPU implementation of inclusion-based points-to analysis
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
GPU programming in a high level language: compiling X10 to CUDA
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Proceedings of the 2012 Haskell Symposium
Optimising purely functional GPU programs
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
A T2 graph-reduction approach to fusion
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Towards a streaming model for nested data parallelism
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Hi-index | 0.00 |
Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. NESL is a first-order functional language that was designed to allow programmers to write irregular-parallel programs - such as parallel divide-and-conquer algorithms - for wide-vector parallel computers. This paper presents our port of the NESL implementation to work on GPUs and provides empirical evidence that nested data-parallelism (NDP) on GPUs significantly outperforms CPU-based implementations and matches or beats newer GPU languages that support only flat parallelism. While our performance does not match that of hand-tuned CUDA programs, we argue that the notational conciseness of NESL is worth the loss in performance. This work provides the first language implementation that directly supports NDP on a GPU.