A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Experience with efficient array data flow analysis for array privatization
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
In this paper, we study how array contraction can enhance locality and improve performance. In our previous work, we have developed a memory minimization scheme, SFC, which is a combination of loop shifting, loop fusion and array contraction. SFC focuses on reducing the memory requirement, and as a by-product, it may enhance cache locality. In this paper, we study how array contraction can contribute to cache locality and performance enhancement. We develop a memory cost model for SFC. We also present a fusion algorithm so that the predicted locality enhancement can be realized. Experimental results on both a real machine and a simulator demonstrate the effectiveness of array contraction on cache locality enhancement and performance improvement.