Architecture of an automatically tuned linear algebra library

Authors:
Javier Cuenca;Domingo Giménez;José González
Affiliations:
Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, 30071 Murcia, Spain;Departamento de Informática y Sistemas Informáticos, Universidad de Murcia, 30071 Murcia, Spain;Intel Barcelona Research Center, Intel Labs. Barcelona, Edificio Nexus II, 08034 Barcelona, Spain
Venue:
Parallel Computing
Year:
2004

Citing 14
Cited 12

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
ScaLAPACK user's guide

ScaLAPACK user's guide
Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers

Journal of Parallel and Distributed Computing
A Family of High-Performance Matrix Multiplication Algorithms

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Hierarchical Approach for Performance Analysis of ScaLAPACK-Based Routines Using the Distributed Linear Algebra Machine

PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
Self-adapting software for numerical linear algebra and LAPACK for clusters

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
The GrADS Project: Software Support for High-Level Grid Application Development

International Journal of High Performance Computing Applications
Numerical Libraries and the Grid

International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming
Towards the design of an automatically tuned linear algebra library

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing

Heuristics for work distribution of a homogeneous parallel dynamic programming scheme on heterogeneous systems

Parallel Computing - Heterogeneous computing
Performance Modeling and Optimal Block Size Selection for a BLAS-3 Based Tridiagonalization Algorithm

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
ABCLib_DRSSED: A parallel eigensolver with an auto-tuning facility

Parallel Computing
ABCLibScript: a directive to support specification of an auto-tuning facility for numerical software

Parallel Computing
Using Metaheuristics in a Parallel Computing Course

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part II
Using experimental data to improve the performance modelling of parallel linear algebra routines

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Performance modeling and optimal block size selection for the small-bulge multishift QR algorithm

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Performance modeling of multishift QR algorithms for the parallel solution of symmetric tridiagonal eigenvalue problems

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Modeling performance through memory-stalls

ACM SIGMETRICS Performance Evaluation Review
A framework for the application of metaheuristics to tasks-to-processors assignation problems

The Journal of Supercomputing
Auto-tuning methodology to represent landform attributes on multicore and multi-GPU systems

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

One approach for a hierarchical architecture of a set of linear algebra libraries with self-optimisation capacity is shown. In previous works the optimisation of several routines was studied separately, and in this work the ideas applied to individual routines are combined with the classical hierarchy of linear algebra libraries. Each self-optimised library consists of the former routines of the library and additional special routines which obtain information of the characteristics on the system and tune certain parameters of the former routines accordingly. The relationship between libraries of the different levels of the hierarchy is also strengthened. Just as each routine has in its code different calls to lower levels, so this routine will use the self-optimisation information of these other routines to generate its own information. Experiments with routines of different levels and on different kinds of platforms with constant, variable and heterogeneous load have been carried out. The results obtained allow us to conclude that the proposed methodology is valid for obtaining self-optimised linear algebra libraries.