Iterative compilation with kernel exploration

Authors:
D. Barthou;S. Donadio;A. Duchateau;W. Jalby;E. Courtois
Affiliations:
Université de Versailles, France;Bull SA Company, France and Université de Versailles, France;Université de Versailles, France;Université de Versailles, France;CAPS Entreprise, France
Venue:
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Year:
2006

Citing 11
Cited 1

Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A user level program transformation tool

ICS '98 Proceedings of the 12th international conference on Supercomputing
Automatic algorithm recognition and replacement: a new approach to program optimization

Automatic algorithm recognition and replacement: a new approach to program optimization
Transformations for imperfectly nested loops

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing

International Journal of High Performance Computing Applications
A language for the compact representation of multiple program versions

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
On domain-specific languages reengineering

GPCE'05 Proceedings of the 4th international conference on Generative Programming and Component Engineering

Speeding up Nek5000 with autotuning and specialization

Proceedings of the 24th ACM International Conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing complexity of hardware mechanisms for recent processors makes high performance code generation very challenging. One of the main issue for high performance is the optimization of memory accesses. General purpose compilers, with no knowledge of the application context and approximate memory model, seem inappropriate for this task. Combining application-dependent optimizations on the source code and exploration of optimization parameters as it is achieved with ATLAS, has been shown as one way to improve performance. Yet, hand-tuned codes such as in the MKL library still outperform ATLAS with an important speed-up and some effort has to be done in order to bridge the gap between performance obtained by automatic and manual optimizations. In this paper, a new iterative compilation approach for the generation of high performance codes is proposed. This approach is not application-dependent, compared to ATLAS. The idea is to separate the memory optimization phase from the computation optimization phase. The first step automatically finds all possible decompositions of the code into kernels. With datasets that fit into the cache and simplified memory accesses, these kernels are simpler to optimize, either with the compiler, at source level, or with a dedicated code generator. The best decomposition is then found by a model-guided approach, performing on the source code the required memory optimizations. Exploration of optimization sequences and their parameters is achieved with a meta-compilation language, X language. The first results on linear algebra codes for Itanium show that the performance obtained reduce the gap with those of highly optimized hand-tuned codes.