The effect of cache models on iterative compilation for combined tiling and unrolling: Research Articles

Authors:
P. M. W. Knijnenburg;T. Kisuki;K. Gallivan;M. F. P. O'Boyle
Affiliations:
LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands;LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands;Department of Computer Science, Florida State University, U.S.A.;Institute for Computing Systems Architecture, Edinburgh University, U.K.
Venue:
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Year:
2004

Citing 0
Cited 8

Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Proceedings of the international symposium on Code generation and optimization
Reduction Transformations for Optimization Parameter Selection

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Profitable loop fusion and tiling using model-driven empirical search

Proceedings of the 20th annual international conference on Supercomputing
Positivity, posynomials and tile size selection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Automatic creation of tile size selection models

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Journal of Parallel and Distributed Computing
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Iterative compilation, where we search for the best program transformation based on profile feedback information, has been highly successful in determining good program optimizations, typically outperforming all static approaches. However, this benefit has come at a price, namely, a large number of executions of the target program which in many cases is unacceptable. This paper is concerned with reducing the number of program executions needed by iterative compilation by incorporating static models. We examine how static models may be incorporated into a purely iterative approach by developing several characterized heuristics. We empirically explore the tradeoff between reducing the number of executions required and the ultimate performance achieved for differing parameters or slack factors. We focus on tiling and unrolling as transformations and cache models as a test case for the feasibility of this approach. We show that a highly accurate model, used as a filter to profiling and appropriately parameterized, can reduce the number of executions by 50%. We also show that using a simple model to rank transformations and profiling, only those with the highest ranking can reduce the number of executions even further up to 70%, in cases where there is only a limited number of profiles available. We conclude that a production compiler might perform best using the last approach, gaining the benefit of iterative compilation at a much reduced cost. Copyright © 2004 John Wiley & Sons, Ltd.