A scalable auto-tuning framework for compiler optimization

Authors:
Ananta Tiwari;Chun Chen;Jacqueline Chame;Mary Hall;Jeffrey K. Hollingsworth
Affiliations:
University of Maryland, Department of Computer Science, College Park, 20740 USA;University of Utah, School of Computing, Salt Lake City, 84112 USA;University of Southern California, Information Sciences Institute, Marina del Ray, 90292 USA;University of Utah, School of Computing, Salt Lake City, 84112 USA;University of Maryland, Department of Computer Science, College Park, 20740 USA
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 36

Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Model-guided autotuning of high-productivity languages for petascale computing

Proceedings of the 18th ACM international symposium on High performance distributed computing
Speeding up Nek5000 with autotuning and specialization

Proceedings of the 24th ACM International Conference on Supercomputing
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic selection of implementation variants of sequential iterated runge-kutta methods with tile size sampling

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance

International Journal of High Performance Computing Applications
Automatic performance debugging of SPMD-style parallel programs

Journal of Parallel and Distributed Computing
AARTS: low overhead online adaptive auto-tuning

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Probabilistic auto-tuning for architectures with complex constraints

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Auto-tuning full applications: A case study

International Journal of High Performance Computing Applications
An efficient time-step-based self-adaptive algorithm for predictor-corrector methods of Runge-Kutta type

Journal of Computational and Applied Mathematics
Enhancing locality for recursive traversals of recursive structures

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Using machine learning to improve automatic vectorization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Journal of Parallel and Distributed Computing
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Language and compiler support for auto-tuning variable-accuracy algorithms

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Extendable pattern-oriented optimization directives

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parameterized micro-benchmarking: an auto-tuning approach for complex applications

Proceedings of the 9th conference on Computing Frontiers
POET: a scripting language for applying parameterized source-to-source program transformations

Software—Practice & Experience
Auto-tuning for energy usage in scientific applications

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Polyhedra scanning revisited

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Panacea: towards holistic optimization of MapReduce applications

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Analytical bounds for optimal tile size selection

CC'12 Proceedings of the 21st international conference on Compiler Construction
Extendable pattern-oriented optimization directives

ACM Transactions on Architecture and Code Optimization (TACO)
Portable section-level tuning of compiler parallelized applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A multi-objective auto-tuning framework for parallel codes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Implementing an affordable high-performance computing for teaching-oriented computer science curriculum

ACM Transactions on Computing Education (TOCE)
AutoTune: a plugin-driven approach to the automatic tuning of parallel applications

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Tile size selection revisited

ACM Transactions on Architecture and Code Optimization (TACO)
On Expressing Strategies for Directive-Driven Multicore Programing Models

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Controlling a complete hardware synthesis toolchain with LARA aspects

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony's parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically select the one with the best-performing implementation. The resulting system achieves performance of compiler-generated code comparable to the fully automated version of the ATLAS library for the tested kernels. Performance for various kernels is 1.4 to 3.6 times faster than the native Intel compiler without search. Our search algorithm simultaneously evaluates different combinations of compiler optimizations and converges to solutions in only a few tens of search-steps.