The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Fast, effective dynamic compilation
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Efficient incremental run-time specialization for free
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A tile selection algorithm for data locality and cache interference
ICS '99 Proceedings of the 13th international conference on Supercomputing
Combining Loop Transformations Considering Caches and Scheduling
International Journal of Parallel Programming
A Feasibility Study in Iterative Compilation
ISHPC '99 Proceedings of the Second International Symposium on High Performance Computing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
ADAPT: Automated De-Coupled Adaptive Program Transformation
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Combining Optimization for Cache and Instruction-Level Parallelism
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The Journal of Supercomputing
Iterative compilation for energy reduction
Journal of Embedded Computing - Cache exploitation in embedded systems
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming with relaxed synchronization
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Hi-index | 0.00 |
In this paper, we give an overview of a novel approach to the problem of how to select compiler optimizations, their parameters, and the order in which to employ them. In particular, we concentrate on the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of architectures. Next we discuss how to incorporate static models as a means to filter out certain combinations of transformations that are unlikely to produce good results. Finally, we show that the approach is applicable to real programs by employing the technique to three SPECfp benchmarks.