The ALPHA language and its use for the design of systolic arrays
Journal of VLSI Signal Processing Systems - Special issue: algorithms and parallel VSLI architecture
Domain-specific languages: an annotated bibliography
ACM SIGPLAN Notices
The parallel execution of DO loops
Communications of the ACM
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
AMAST '02 Proceedings of the 9th International Conference on Algebraic Methodology and Software Technology
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Programming graphics processors functionally
Haskell '04 Proceedings of the 2004 ACM SIGPLAN workshop on Haskell
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
HMMoC—a compiler for hidden Markov models
Bioinformatics
Evaluating the use of GPUs in liver image segmentation and HMMER database searches
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A tutorial on parallel and concurrent programming in Haskell
AFP'08 Proceedings of the 6th international conference on Advanced functional programming
Nikola: embedding compiled GPU functions in Haskell
Proceedings of the third ACM Haskell symposium on Haskell
Language virtualization for heterogeneous parallel computing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
GPU parallelization of algebraic dynamic programming
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
Hi-index | 0.00 |
Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions. However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture. In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.