Fast Greedy Weighted Fusion

Authors:
Ken Kennedy
Affiliations:
Center for High Performance Software, Rice University, 6100 Main, Houston, Texas
Venue:
International Journal of Parallel Programming
Year:
2001

Citing 16
Cited 4

Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A new approach to the maximum-flow problem

Journal of the ACM (JACM)
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Optimization of array accesses by collective loop transformations

ICS '91 Proceedings of the 5th international conference on Supercomputing
Vector Register Allocation

IEEE Transactions on Computers
Memory-hierarchy management

Memory-hierarchy management
Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
The Classification, Fusion, and Parallelization of Array Language Primitives

IEEE Transactions on Parallel and Distributed Systems
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Stream processing

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
On the Complexity of Loop Fusion

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Dependence analysis for subscripted variables and its application to program transformations

Dependence analysis for subscripted variables and its application to program transformations
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
Buffer and Register Allocation for Memory Space Optimization

Journal of VLSI Signal Processing Systems
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications

Journal of Signal Processing Systems
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loop fusion is an important compiler strategy for managing memory hierarchy. By fusing loops that use the same data elements, a compiler can reduce the distance between accesses to the same datum and avoid costly cache misses. Unfortunately the problem of optimal loop fusion for reuse has been shown to be NP-hard, so compilers must resort to heuristics to avoid unreasonably long compile times. Greedy strategies are often excellent heuristics that produce high-quality solutions quickly. We present an algorithm for greedy weighted fusion, in which the heaviest edge (the one with the most reuse) is selected for possible fusion on each step. The algorithm is shown to be fast in the sense that it takes O(V(E+V)) time, which is arguably optimal for producing the greedy solution. In addition, this algorithm has the advantage that it requires only O(E) edge reweighting operations after fusions. This means that it is suitable for use on the problem of enhancing cache reuse, for which the ideal reweighting operation is much more complex than addition. If each reweighting operation requires no more than O(V) time, the time bound of the overall fusion process remains at O(V(E+V)).