Synthesising graphics card programs from DSLs

Authors:
Luke Cartey;Rune Lyngsø;Oege de Moor
Affiliations:
University of Oxford, Oxford, United Kingdom;University of Oxford, Oxford, United Kingdom;University of Oxford, Oxford, United Kingdom
Venue:
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Year:
2012

Citing 14
Cited 0

The ALPHA language and its use for the design of systolic arrays

Journal of VLSI Signal Processing Systems - Special issue: algorithms and parallel VSLI architecture
Domain-specific languages: an annotated bibliography

ACM SIGPLAN Notices
The parallel execution of DO loops

Communications of the ACM
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
Algebraic Dynamic Programming

AMAST '02 Proceedings of the 9th International Conference on Algebraic Methodology and Software Technology
Loop Parallelization in the Polytope Model

CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Programming graphics processors functionally

Haskell '04 Proceedings of the 2004 ACM SIGPLAN workshop on Haskell
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
HMMoC—a compiler for hidden Markov models

Bioinformatics
Evaluating the use of GPUs in liver image segmentation and HMMER database searches

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A tutorial on parallel and concurrent programming in Haskell

AFP'08 Proceedings of the 6th international conference on Advanced functional programming
Nikola: embedding compiled GPU functions in Haskell

Proceedings of the third ACM Haskell symposium on Haskell
Language virtualization for heterogeneous parallel computing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
GPU parallelization of algebraic dynamic programming

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions. However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture. In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.