A T2 graph-reduction approach to fusion

Authors:
Troels Henriksen;Cosmin Eugen Oancea
Affiliations:
DIKU, Copenhagen, Denmark;DIKU, Copenhagen, Denmark
Venue:
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Year:
2013

Citing 23
Cited 0

An introduction to the theory of lists

Proceedings of the NATO Advanced Study Institute on Logic of programming and calculi of discrete design
Algebraic identities for program calculation

The Computer Journal - Special issue on Lazy functional programming
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
A short cut to deforestation

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Programming parallel algorithms

Communications of the ACM
Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
The range test: a dependence test for symbolic, non-linear expressions

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Conventional and Uniqueness Typing in Graph Rewrite Systems

Proceedings of the 13th Conference on Foundations of Software Technology and Theoretical Computer Science
Programming language design issues

Proceedings of the DoD Sponsored Workshop on Design and Implementation of Programming Languages
Hybrid analysis: static & dynamic memory reference analysis

International Journal of Parallel Programming
Interprocedural parallelization analysis in SUIF

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
SAC: a functional array language for efficient multi-threaded execution

International Journal of Parallel Programming
Data parallel Haskell: a status report

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Regular, shape-polymorphic, parallel arrays in Haskell

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Expressive array constructs in an embedded GPU kernel programming language

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Rewriting haskell strings

PADL'07 Proceedings of the 9th international conference on Practical Aspects of Declarative Languages
Logical inference techniques for loop parallelization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Financial software on GPUs: between Haskell and Fortran

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Nested data-parallelism on the gpu

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Optimising purely functional GPU programs

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement. In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms. In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers. Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation. We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation. We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.