Statistical Models for Automatic Performance Tuning

Authors:
Rich Vuduc;James Demmel;Jeff Bilmes
Affiliations:
-;-;-
Venue:
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Year:
2001

Citing 7
Cited 8

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Learning to Predict Performance from Formula Modeling and Training Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The PHiPAC v1.0 Matrix-Multiply Distribution

The PHiPAC v1.0 Matrix-Multiply Distribution

Better tiling and array contraction for compiling scientific programs

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Data mining for simulation algorithm selection

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Prospectus for the next LAPACK and ScaLAPACK libraries

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Using experimental data to improve the performance modelling of parallel linear algebra routines

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Practical performance models of algorithms in evolutionary program induction and other domains

Artificial Intelligence
Automating the runtime performance evaluation of simulation algorithms

Winter Simulation Conference
Models of performance of evolutionary program induction algorithms based on indicators of problem difficulty

Evolutionary Computation
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Achieving peak performance from library subroutines usually requires extensive, machine-dependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate, at compile-time, by (1) generating a large number of possible implementations of a subroutine, and (2) selecting a fast implementation by an exhaustive, empirical search. This paper applies statistical techniques to exploit the large amount of performance data collected during the search. First, we develop a heuristic for stopping an exhaustive compiletime search early if a near-optimal implementation is found. Second, we show how to construct run-time decision rules, based on run-time inputs, for selecting from among a subset of the best implementations. We apply our methods to actual performance data collected by the PHiPAC tuning system for matrix multiply on a variety of hardware platforms.