The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Fast, effective dynamic compilation
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Combining loop transformations considering caches and scheduling
International Journal of Parallel Programming - Special issue: MICRO-29, 29th annual IEEE/ACM international symposium on microarchitecture
Efficient incremental run-time specialization for free
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A tile selection algorithm for data locality and cache interference
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Feasibility Study in Iterative Compilation
ISHPC '99 Proceedings of the Second International Symposium on High Performance Computing
Cache Models for Iterative Compilation
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
ADAPT: Automated De-Coupled Adaptive Program Transformation
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Combining Optimization for Cache and Instruction-Level Parallelism
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations
Transactions on High-Performance Embedded Architectures and Compilers I
Elastic computing: A portable optimization framework for hybrid computers
Parallel Computing
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.01 |
In this paper, we give an overview of a novel approach to the problem of how to select compiler optimizations, their parameters, and the order in which to employ them. In particular, we concentrate on the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of architectures. Next we discuss how to incorporate static models as a means to filter out certain combinations of transformations that are unlikely to produce good results. Finally, we show that the approach is applicable to real programs by employing the technique to three SPECfp benchmarks.