A Compiler Framework for Tiling Imperfectly-Nested Loops

Authors:
Yonghong Song;Zhiyuan Li
Affiliations:
-;-
Venue:
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Year:
1999

Citing 14
Cited 1

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Cache interference phenomena

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Experience with efficient array data flow analysis for array privatization

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
Precise miss analysis for program transformations with caches of arbitrary associativity

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Quantifying the multi-level nature of tiling interactions

International Journal of Parallel Programming
Transformations for imperfectly nested loops

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research

On the interaction of tiling and automatic parallelization

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respective exploited reuse factors. The preliminary results with several benchmark programs show that the transformed programs achieve a speedup of 1.09 to 3.82 over the original programs.