Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Memory-hierarchy management
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Profitable loop fusion and tiling using model-driven empirical search
Proceedings of the 20th annual international conference on Supercomputing
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Automated transformation for performance-critical kernels
LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design
Hi-index | 0.00 |
Loop fusion is recognized as an effective program transformation for improving memory hierarchy performance. However, unconstrained loop fusion can lead to poor performance because of increased register pressure and cache conflict misses. The complex interaction between different levels of the memory hierarchy with the input program makes it very difficult to always make the right choice in fusing loops. In this paper, we present a cache-conscious analytical model for profitable loop fusion to be used with a constrained weighted fusion algorithm. We then extend the model to show its effectiveness in the context of an empirical tuning framework. A preliminary evaluation of the model is presented using hand experiments on four applications.