Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

Authors:
Jesús Cámara;Javier Cuenca;Domingo Giménez;Luis Pedro García;Antonio M. Vidal
Affiliations:
Departamento de Informática y Sistemas, University of Murcia, Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, University of Murcia, Murcia, Spain;Departamento de Informática y Sistemas, University of Murcia, Murcia, Spain;Servicio de Apoyo a la Investigación Tecnológica, Technical University of Cartagena, Cartagena, Spain;Departamento de Sistemas Informáticos y Computación, Technical University of Valencia, Valencia, Spain
Venue:
International Journal of Parallel Programming
Year:
2014

Citing 17
Cited 0

LAPACK's user's guide

LAPACK's user's guide
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Statistical Models for Automatic Performance Tuning

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Self-adapting software for numerical linear algebra and LAPACK for clusters

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Architecture of an automatically tuned linear algebra library

Parallel Computing
Numerical Libraries and the Grid

International Journal of High Performance Computing Applications
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

International Journal of High Performance Computing Applications
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
ABCLib_DRSSED: A parallel eigensolver with an auto-tuning facility

Parallel Computing
Building the functional performance model of a processor

Proceedings of the 2006 ACM symposium on Applied computing
Designing polylibraries to speed up linear algebra computations

International Journal of High Performance Computing and Networking
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
Optimizing the execution of a parallel meteorology simulation code

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
d-spline based incremental parameter estimation in automatic performance tuning

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
An Improved Magma Gemm For Fermi Graphics Processing Units

International Journal of High Performance Computing Applications
Improving Linear Algebra Computation on NUMA Platforms through Auto-tuned Nested Parallelism

PDP '12 Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Autotuning GEMM Kernels for the Fermi GPU

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. Information obtained in the installation of the routines is used at running time to take some decisions to reduce the total execution time. The study is carried out with routines at different levels (matrix multiplication, LU and Cholesky factorizations and linear systems symmetric or general routines) and with calls to routines in the LAPACK and PLASMA libraries with multithread implementations. Medium NUMA and large cc-NUMA systems are used in the experiments. This variety of routines, libraries and systems allows us to obtain general conclusions about the methodology to use for linear algebra shared-memory routines auto-tuning. Satisfactory execution times are obtained with the proposed methodology.