Simple optimizations for an applicative array language for graphics processors

Authors:
Bradford Larsen
Affiliations:
Tufts University, Medford, MA, USA
Venue:
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Year:
2011

Citing 20
Cited 3

Lambda lifting: transforming programs to recursive equations

Proc. of a conference on Functional programming languages and computer architecture
Higher-order abstract syntax

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Implementation of a portable nested data-parallel language

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A short cut to deforestation

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Programming parallel algorithms

Communications of the ACM
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Guarded recursive datatype constructors

POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Arrays in Blitz++

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Compiling embedded languages

Journal of Functional Programming
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Shader algebra

ACM SIGGRAPH 2004 Papers
N-Body simulation on GPUs

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Stream fusion: from lists to streams to nothing at all

ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Efficient computation of sum-products on GPUs through software-managed cache

Proceedings of the 22nd annual international conference on Supercomputing
A programming language

AIEE-IRE '62 (Spring) Proceedings of the May 1-3, 1962, spring joint computer conference
Nikola: embedding compiled GPU functions in Haskell

Proceedings of the third ACM Haskell symposium on Haskell
Regular, shape-polymorphic, parallel arrays in Haskell

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Implementing survey propagation on graphics processing units

SAT'06 Proceedings of the 9th international conference on Theory and Applications of Satisfiability Testing

Expressive array constructs in an embedded GPU kernel programming language

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Nested data-parallelism on the gpu

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Optimising purely functional GPU programs

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, array fusion, and hoisting of nested parallel constructs. These optimizations are simple to implement because of the design of the language to which they are applied but can result in large run-time speedups.