A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
Multilevel hierarchical matrix multiplication on clusters
Proceedings of the 18th annual international conference on Supercomputing
Hi-index | 0.00 |
Sophisticated parallel matrix multiplication algorithms like PDGEMM exhibit a complex structure and can be controlled by a large set of parameters including blocking factors and block sizes used for the serial execution on one of the participating processors. But it requires a deep understanding of both the parallel algorithm and the execution platform to select the parameters such that a minimum execution time results. In this article, we describe a simple mechanism that automatically selects a suitable set of parameters for PDGEMM which leads to a minimum execution time in most cases.