A Family of High-Performance Matrix Multiplication Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization
High Performance Computing for Computational Science - VECPAR 2008
Hi-index | 0.00 |
The optimization of the BLAS is discussed, with examples given for the IBM superscalar RISC S/6000. The approach suggested is to use block data structures based on store-by-block schemes. We give results and analysis of the optimization of DGEMM. We also suggest how these results can be applied to the higher level factorizations and the other BLAS. Results are given to show the advantages of using block data structures.