A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Experience with efficient array data flow analysis for array privatization
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Quantifying the multi-level nature of tiling interactions
International Journal of Parallel Programming
Transformations for imperfectly nested loops
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
On the interaction of tiling and automatic parallelization
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Hi-index | 0.00 |
This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respective exploited reuse factors. The preliminary results with several benchmark programs show that the transformed programs achieve a speedup of 1.09 to 3.82 over the original programs.