An introduction to the theory of lists
Proceedings of the NATO Advanced Study Institute on Logic of programming and calculi of discrete design
Algebraic identities for program calculation
The Computer Journal - Special issue on Lazy functional programming
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Programming parallel algorithms
Communications of the ACM
Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
The range test: a dependence test for symbolic, non-linear expressions
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Conventional and Uniqueness Typing in Graph Rewrite Systems
Proceedings of the 13th Conference on Foundations of Software Technology and Theoretical Computer Science
Programming language design issues
Proceedings of the DoD Sponsored Workshop on Design and Implementation of Programming Languages
Hybrid analysis: static & dynamic memory reference analysis
International Journal of Parallel Programming
Interprocedural parallelization analysis in SUIF
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
SAC: a functional array language for efficient multi-threaded execution
International Journal of Parallel Programming
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Regular, shape-polymorphic, parallel arrays in Haskell
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Expressive array constructs in an embedded GPU kernel programming language
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
PADL'07 Proceedings of the 9th international conference on Practical Aspects of Declarative Languages
Logical inference techniques for loop parallelization
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Financial software on GPUs: between Haskell and Fortran
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Optimising purely functional GPU programs
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Hi-index | 0.00 |
Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement. In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms. In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers. Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation. We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation. We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.