Model-guided empirical tuning of loop fusion

Authors:
Apan Qasem;Ken Kennedy
Affiliations:
Department of Computer Science, Texas State University, San Marcos, TX, USA.;Department of Computer Science, Rice University, Houston, TX, USA
Venue:
International Journal of High Performance Systems Architecture
Year:
2008

Citing 16
Cited 1

Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Memory-hierarchy management

Memory-hierarchy management
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
14.9 TFLOPS three-dimensional fluid simulation for fusion science with HPF on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
On the Complexity of Loop Fusion

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
IMPLEMENTATION OF A FULLY-BALANCED PERIODIC TRIDIAGONAL SOLVER ON A PARALLEL DISTRIBUTED MEMORY ARCHITECTURE

IMPLEMENTATION OF A FULLY-BALANCED PERIODIC TRIDIAGONAL SOLVER ON A PARALLEL DISTRIBUTED MEMORY ARCHITECTURE
Fast searches for effective optimization phase sequences

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Evaluating iterative compilation

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Towards making autotuning mainstream

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loop fusion is recognised as an effective transformation for improving memory hierarchy performance. However, unconstrained loop fusion can lead to poor performance because of increased register pressure and cache conflict misses. In this paper, we present a cache-conscious analytical model for profitable loop fusion. We use this model to tune fusion parameters for different architectures through empirical search. Experiments on four different platforms for a set of applications show significant speedup over fully optimised code generated by state-of-the-art commercial compilers.