Auto-tuning full applications: A case study

Authors:
Ananta Tiwari;Jeffrey K Hollingsworth; Chun Chen;Mary Hall; Chunhua Liao;Daniel J Quinlan;Jacqueline Chame
Affiliations:
Department of Computer Science, University of Maryland,College Park, MD, USA;Department of Computer Science, University of Maryland,College Park, MD, USA;School of Computing, University of Utah, Salt Lake City,UT, USA;School of Computing, University of Utah, Salt Lake City,UT, USA;Center for Applied Scientific Computing, Lawrence LivermoreNational Laboratory, Livermore, CA, USA;Center for Applied Scientific Computing, Lawrence LivermoreNational Laboratory, Livermore, CA, USA;Information Sciences Institute, University of SouthernCalifornia, Marina del Ray, CA, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2011

Citing 20
Cited 6

Semicoarsening Multigrid on Distributed Memory Machines

SIAM Journal on Scientific Computing
SPL: a language and compiler for DSP algorithms

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Active harmony: towards automated performance tuning

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Parallel Parameter Tuning for Applications with Performance Variability

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Automatic tuning of whole applications using direct search and a performance-based transformation system

The Journal of Supercomputing
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Ablego: a function outlining and partial inlining framework: Research Articles

Software—Practice & Experience
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Model-guided empirical optimization for memory hierarchy

Model-guided empirical optimization for memory hierarchy
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A scalable auto-tuning framework for compiler optimization

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Annotation-based empirical performance tuning using Orio

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A code isolator: isolating code fragments from large programs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Effective source-to-source outlining to support whole program empirical optimization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more

Polyhedra scanning revisited

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Portable section-level tuning of compiler parallelized applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
OpenMP and MPI application energy measurement variation

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Towards making autotuning mainstream

International Journal of High Performance Computing Applications
Tools for machine-learning-based empirical autotuning and specialization

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we take a concrete step towards materializing our long-term goal of providing a fully automatic end-to-end tuning infrastructure for arbitrary program components and full applications. We describe a general-purpose offline auto-tuning framework and apply it to an application benchmark, SMG2000, a semi-coarsening multigrid on structured grids. We show that the proposed system first extracts computationally intensive loop nests into separate executable functions, a code transformation called outlining. The outlined loop nests are then tuned by the framework and subsequently integrated back into the application. Each loop nest is optimized through a series of composable code transformations, with the transformations parameterized by unbound optimization parameters that are bound during the tuning process. The values for these parameters are selected using a search-based auto-tuner, which performs a parallel heuristic search for the best-performing optimized variants of the outlined loop nests. We show that our system pinpoints a code variant that performs 2.37 times faster than the original loop nest. When the full application is run using the code variant found by the system, the applicationâ聙聶s performance improves by 27%.