Expressive array constructs in an embedded GPU kernel programming language

Authors:
Koen Claessen;Mary Sheeran;Bo Joel Svensson
Affiliations:
Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden
Venue:
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Year:
2012

Citing 12
Cited 9

The periodic balanced sorting network

Journal of the ACM (JACM)
The Design and Verification of a Sorter Core

CHARME '01 Proceedings of the 11th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Describing Butterfly Networks in Ruby

Proceedings of the 1989 Glasgow Workshop on Functional Programming
Compiling embedded languages

Journal of Functional Programming
A poor man's concurrency monad

Journal of Functional Programming
Sorting networks and their applications

AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Efficient stream compaction on wide SIMD many-core architectures

Proceedings of the Conference on High Performance Graphics 2009
Nikola: embedding compiled GPU functions in Haskell

Proceedings of the third ACM Haskell symposium on Haskell
Regular, shape-polymorphic, parallel arrays in Haskell

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Accelerating Haskell array codes with multicore GPUs

Proceedings of the sixth workshop on Declarative aspects of multicore programming
Simple optimizations for an applicative array language for graphics processors

Proceedings of the sixth workshop on Declarative aspects of multicore programming
The design and implementation of feldspar an embedded language for digital signal processing

IFL'10 Proceedings of the 22nd international conference on Implementation and application of functional languages

Feldspar: application and implementation

CEFP'11 Proceedings of the 4th Summer School conference on Central European Functional Programming School
A generic abstract syntax model for embedded languages

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Optimising purely functional GPU programs

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Simple and compositional reification of monadic embedded languages

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Counting and occurrence sort for GPUs using an embedded language

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
A T2 graph-reduction approach to fusion

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Data flow fusion with series expressions in Haskell

Proceedings of the 2013 ACM SIGPLAN symposium on Haskell
An EDSL approach to high performance Haskell programming

Proceedings of the 2013 ACM SIGPLAN symposium on Haskell
Embrace, defend, extend: a methodology for embedding preexisting DSLs

Proceedings of the 1st annual workshop on Functional programming concepts in domain-specific languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPUs) are powerful computing devices that with the advent of CUDA/OpenCL are becomming useful for general purpose computations. Obsidian is an embedded domain specific language that generates CUDA kernels from functional descriptions. A symbolic array construction allows us to guarantee that intermediate arrays are fused away. However, the current array construction has some drawbacks; in particular, arrays cannot be combined efficiently. We add a new type of push arrays to the existing Obsidian system in order to solve this problem. The two array types complement each other, and enable the definition of combinators that both take apart and combine arrays, and that result in efficient generated code. This extension to Obsidian is demonstrated on a sequence of sorting kernels, with good results. The case study also illustrates the use of combinators for expressing the structure of parallel algorithms. The work presented is preliminary, and the combinators presented must be generalised. However, the raw speed of the generated kernels bodes well.