A Computational Approach to Edge Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scan primitives for vector computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire
Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture
On the Distribution Implementation of Aggregate Data Structures by Program Transformation
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell
IFL '99 Selected Papers from the 11th International Workshop on Implementation of Functional Languages
Secrets of the Glasgow Haskell Compiler inliner
Journal of Functional Programming
Programming graphics processors functionally
Haskell '04 Proceedings of the 2004 ACM SIGPLAN workshop on Haskell
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Stream fusion: from lists to streams to nothing at all
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Unembedding domain-specific languages
Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Type-safe observable sharing in Haskell
Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
An integer programming framework for optimizing shared memory use on GPUs
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Nikola: embedding compiled GPU functions in Haskell
Proceedings of the third ACM Haskell symposium on Haskell
Regular, shape-polymorphic, parallel arrays in Haskell
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Accelerating Haskell array codes with multicore GPUs
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Simple optimizations for an applicative array language for graphics processors
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Implementing fusion-equipped parallel skeletons by expression templates
IFL'09 Proceedings of the 21st international conference on Implementation and application of functional languages
Efficient parallel stencil convolution in Haskell
Proceedings of the 4th ACM symposium on Haskell
Expressive array constructs in an embedded GPU kernel programming language
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Guiding parallel array fusion with indexed types
Proceedings of the 2012 Haskell Symposium
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
A generic abstract syntax model for embedded languages
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Exploiting vector instructions with generalized stream fusio
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Using circular programs for higher-order syntax: functional pearl
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
A T2 graph-reduction approach to fusion
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Hi-index | 0.00 |
Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slow-down is not acceptable on target hardware that is usually chosen to achieve high performance. In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures. Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU. We present novel methods for implementing sharing recovery and array fusion, and demonstrate their effectiveness on a set of benchmarks.