Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes

Authors:
Boyana Norris;Albert Hartono;Elizabeth Jessup;Jeremy Siek
Affiliations:
Argonne National Laboratory,;Ohio State University,;University of Colorado at Boulder,;University of Colorado at Boulder,
Venue:
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Year:
2009

Citing 9
Cited 1

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
High-level semantic optimization of numerical codes

ICS '99 Proceedings of the 13th international conference on Supercomputing
High-performacne parallel implicit CFD

Parallel Computing - Special issue on parallel computing in aerospace
Increasing temporal locality with skewing and recursive blocking

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Annotation-based empirical performance tuning using Orio

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to choose between investing considerable time in tuning code or accepting lower performance. In this paper, we describe the first steps toward a fully automated system for the optimization of the matrix algebra kernels that are a foundational part of many scientific applications. To generate highly optimized code from a high-level MATLAB prototype, we define a three-step approach. To begin, we have developed a compiler that converts a MATLAB script into simple C code. We then use the polyhedral optimization system Pluto to optimize that code for coarse-grained parallelism and locality simultaneously. Finally, we annotate the resulting code with performance-tuning directives and use the empirical performance-tuning system Orio to generate many tuned versions of the same operation using different optimization techniques, such as loop unrolling and memory alignment. Orio performs an automated empirical search to select the best among the multiple optimized code variants. We discuss performance results on two architectures.