An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Matrix computations (3rd ed.)
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
ScaLAPACK user's guide
Journal of Parallel and Distributed Computing
A Family of High-Performance Matrix Multiplication Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
Self-adapting software for numerical linear algebra and LAPACK for clusters
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
The GrADS Project: Software Support for High-Level Grid Application Development
International Journal of High Performance Computing Applications
Numerical Libraries and the Grid
International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Towards the design of an automatically tuned linear algebra library
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Parallel Computing - Heterogeneous computing
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
ABCLib_DRSSED: A parallel eigensolver with an auto-tuning facility
Parallel Computing
Using Metaheuristics in a Parallel Computing Course
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part II
Using experimental data to improve the performance modelling of parallel linear algebra routines
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Performance modeling and optimal block size selection for the small-bulge multishift QR algorithm
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Modeling performance through memory-stalls
ACM SIGMETRICS Performance Evaluation Review
A framework for the application of metaheuristics to tasks-to-processors assignation problems
The Journal of Supercomputing
Auto-tuning methodology to represent landform attributes on multicore and multi-GPU systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning
International Journal of Parallel Programming
Hi-index | 0.00 |
One approach for a hierarchical architecture of a set of linear algebra libraries with self-optimisation capacity is shown. In previous works the optimisation of several routines was studied separately, and in this work the ideas applied to individual routines are combined with the classical hierarchy of linear algebra libraries. Each self-optimised library consists of the former routines of the library and additional special routines which obtain information of the characteristics on the system and tune certain parameters of the former routines accordingly. The relationship between libraries of the different levels of the hierarchy is also strengthened. Just as each routine has in its code different calls to lower levels, so this routine will use the self-optimisation information of these other routines to generate its own information. Experiments with routines of different levels and on different kinds of platforms with constant, variable and heterogeneous load have been carried out. The results obtained allow us to conclude that the proposed methodology is valid for obtaining self-optimised linear algebra libraries.