A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Data Partitioning with a Functional Performance Model of Heterogeneous Processors
International Journal of High Performance Computing Applications
Using experimental data to improve the performance modelling of parallel linear algebra routines
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning
International Journal of Parallel Programming
Hi-index | 0.00 |
In this paper, we present an efficient procedure for building a piecewise linear function approximation of the speed function of a processor with hierarchical memory structure. The procedure tries to minimize the experimental time used for building the speed function approximation. We demonstrate the efficiency of our procedure by performing experiments with a matrix multiplication application and a Cholesky Factorization application that use memory hierarchy efficiently and a matrix multiplication application that uses memory hierarchy inefficiently on a local network of heterogeneous computers.