Annotation-based empirical performance tuning using Orio

Authors:
Albert Hartono;Boyana Norris;P. Sadayappan
Affiliations:
Dept. of Computer Science and Engg., Ohio State University, Columbus, 43210-1277, USA;Mathematics and Computer Science Division, Argonne National Laboratory, Illinois 60439-4844, USA;Dept. of Computer Science and Engg., Ohio State University, Columbus, 43210-1277, USA
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 9

Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments

Proceedings of the 7th international conference on Autonomic computing
Speeding up Nek5000 with autotuning and specialization

Proceedings of the 24th ACM International Conference on Supercomputing
A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Probabilistic auto-tuning for architectures with complex constraints

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Auto-tuning full applications: A case study

International Journal of High Performance Computing Applications
Run-time automatic performance tuning for multicore applications

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Towards fully automatic auto-tuning: Leveraging language features of Chapel

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

For many scientific applications, significant time is spent in tuning codes for a particular high-performance architecture. Tuning approaches range from the relatively nonintrusive (e.g., by using compiler options) to extensive code modifications that attempt to exploit specific architecture features. Intrusive techniques often result in code changes that are not easily reversible, and can negatively impact readability, maintainability, and performance on different architectures. We introduce an extensible annotation-based empirical tuning system called Orio that is aimed at improving both performance and productivity. It allows software developers to insert annotations in the form of structured comments into their source code to trigger a number of low-level performance optimizations on a specified code fragment. To maximize the performance tuning opportunities, the annotation processing infrastructure is designed to support both architecture-independent and architecture-specific code optimizations. Given the annotated code as input, Orio generates many tuned versions of the same operation and empirically evaluates the alternatives to select the best performing version for production use. We have also enabled the use of the Pluto automatic parallelization tool in conjunction with Orio to generate efficient OpenMP-based parallel code. We describe our experimental results involving a number of computational kernels, including dense array and sparse matrix operations.