Empirical performance-model driven data layout optimization

Authors:
Qingda Lu;Xiaoyang Gao;Sriram Krishnamoorthy;Gerald Baumgartner;J. Ramanujam;P. Sadayappan
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Computer Science, Louisiana State University, Baton Rouge, LA;Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH
Venue:
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Year:
2004

Citing 14
Cited 0

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Static and Dynamic Locality Optimizations Using Integer Linear Programming

IEEE Transactions on Parallel and Distributed Systems
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Space-time trade-off optimization for a class of electronic structure calculations

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Global arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A high-level approach to synthesis of high-performance codes for quantum chemistry

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A cellular computer to implement the kalman filter algorithm

A cellular computer to implement the kalman filter algorithm

Quantified Score

Hi-index	0.00

Visualization

Abstract

Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be limited by the accuracy of the performance models used. In this paper, we describe an approach where a class of computations is modeled in terms of constituent operations that are empirically measured, thereby allowing modeling of the overall execution time. The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. The effectiveness of the approach is demonstrated through experimental measurements on some representative computations from quantum chemistry.