Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Iterative methods for solving linear systems
Iterative methods for solving linear systems
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Iterative compilation for energy reduction
Journal of Embedded Computing - Cache exploitation in embedded systems
The impact of global communication latency at extreme scales on Krylov methods
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
Naive code generation from high-level languages that encourage modularity can give rise to large numbers of simple loops for array-based programs. Collective loop fusion and array contraction can be used on such codes to improve temporal locality and performance. The problem is typically formalised using a loop dependence graph (LDG), with solutions denoted by fusion partitions. Much previous work has concentrated on approaches to the abstract formulation. We present our technique called iterative collective loop fusion based on empirically evaluating different transformations, and show how it can provide speedups over existing approaches of up to 1.38. We also give results showing that applying such techniques to high-level languages can provide speedups of up to 2.45 over the original code, and outperforms an equivalent code in Fortran.