Optimising purely functional GPU programs

Authors:
Trevor L. McDonell;Manuel M.T. Chakravarty;Gabriele Keller;Ben Lippmeier
Affiliations:
University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia
Venue:
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Year:
2013

Citing 30
Cited 3

A Computational Approach to Edge Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scan primitives for vector computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A short cut to deforestation

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Stable fluids

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire

Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture
On the Distribution Implementation of Aggregate Data Structures by Program Transformation

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell

IFL '99 Selected Papers from the 11th International Workshop on Implementation of Functional Languages
Secrets of the Glasgow Haskell Compiler inliner

Journal of Functional Programming
Programming graphics processors functionally

Haskell '04 Proceedings of the 2004 ACM SIGPLAN workshop on Haskell
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Stream fusion: from lists to streams to nothing at all

ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Parallel Computing
Designing efficient sorting algorithms for manycore GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Unembedding domain-specific languages

Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Type-safe observable sharing in Haskell

Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
An integer programming framework for optimizing shared memory use on GPUs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Nikola: embedding compiled GPU functions in Haskell

Proceedings of the third ACM Haskell symposium on Haskell
Regular, shape-polymorphic, parallel arrays in Haskell

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Accelerating Haskell array codes with multicore GPUs

Proceedings of the sixth workshop on Declarative aspects of multicore programming
Simple optimizations for an applicative array language for graphics processors

Proceedings of the sixth workshop on Declarative aspects of multicore programming
Implementing fusion-equipped parallel skeletons by expression templates

IFL'09 Proceedings of the 21st international conference on Implementation and application of functional languages
Efficient parallel stencil convolution in Haskell

Proceedings of the 4th ACM symposium on Haskell
Expressive array constructs in an embedded GPU kernel programming language

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Guiding parallel array fusion with indexed types

Proceedings of the 2012 Haskell Symposium
Nested data-parallelism on the gpu

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
A generic abstract syntax model for embedded languages

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Optimizing data structures in high-level programs: new directions for extensible compilers based on staging

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Exploiting vector instructions with generalized stream fusio

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming

Using circular programs for higher-order syntax: functional pearl

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
A T2 graph-reduction approach to fusion

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slow-down is not acceptable on target hardware that is usually chosen to achieve high performance. In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures. Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU. We present novel methods for implementing sharing recovery and array fusion, and demonstrate their effectiveness on a set of benchmarks.