Scan primitives for vector computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
Building domain-specific embedded languages
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Template meta-programming for Haskell
Proceedings of the 2002 ACM SIGPLAN workshop on Haskell
On the Distribution Implementation of Aggregate Data Structures by Program Transformation
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Monadic Presentations of Lambda Terms Using Generalized Inductive Types
CSL '99 Proceedings of the 13th International Workshop and 8th Annual Conference of the EACSL on Computer Science Logic
ACM SIGGRAPH 2004 Papers
Programming graphics processors functionally
Haskell '04 Proceedings of the 2004 ACM SIGPLAN workshop on Haskell
Proceedings of the tenth ACM SIGPLAN international conference on Functional programming
Simple unification-based type inference for GADTs
Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Type checking with open type functions
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Unembedding domain-specific languages
Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Type-safe observable sharing in Haskell
Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
FPGA Circuit Synthesis of Accelerator Data-Parallel Programs
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Nikola: embedding compiled GPU functions in Haskell
Proceedings of the third ACM Haskell symposium on Haskell
Regular, shape-polymorphic, parallel arrays in Haskell
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
IFL'09 Proceedings of the 21st international conference on Implementation and application of functional languages
Efficient parallel stencil convolution in Haskell
Proceedings of the 4th ACM symposium on Haskell
Expressive array constructs in an embedded GPU kernel programming language
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Proceedings of the 9th ECOOP Workshop on Reflection, AOP, and Meta-Data for Software Evolution
Parakeet: a just-in-time parallel accelerator for python
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Functional high performance financial IT: the hiperfit research center in copenhagen
TFP'11 Proceedings of the 12th international conference on Trends in Functional Programming
Dynamic symbolic computation for domain-specific language implementation
LOPSTR'11 Proceedings of the 21st international conference on Logic-Based Program Synthesis and Transformation
Generic conversions of abstract syntax representations
Proceedings of the 8th ACM SIGPLAN workshop on Generic programming
Parallel programming in Haskell almost for free: an embedding of intel's array building blocks
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Financial software on GPUs: between Haskell and Fortran
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Adaptive data parallelism for internet clients on heterogeneous platforms
Proceedings of the 8th symposium on Dynamic languages
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Optimising purely functional GPU programs
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Counting and occurrence sort for GPUs using an embedded language
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Towards a functional run-time for dense NLA domain
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Towards a streaming model for nested data parallelism
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Embrace, defend, extend: a methodology for embedding preexisting DSLs
Proceedings of the 1st annual workshop on Functional programming concepts in domain-specific languages
Extensible sparse functional arrays with circuit parallelism
Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming
Synchronous digital circuits as functional programs
ACM Computing Surveys (CSUR)
Algebraic program semantics for supercomputing
Theories of Programming and Formal Methods
Hi-index | 0.00 |
Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge. To raise the level of abstraction, we propose a domain-specific high-level language of array computations that captures appropriate idioms in the form of collective array operations. We embed this purely functional array language in Haskell with an online code generator for NVIDIA's CUDA GPGPU programming environment. We regard the embedded language's collective array operations as algorithmic skeletons; our code generator instantiates CUDA implementations of those skeletons to execute embedded array programs. This paper outlines our embedding in Haskell, details the design and implementation of the dynamic code generator, and reports on initial benchmark results. These results suggest that we can compete with moderately optimised native CUDA code, while enabling much simpler source programs.