A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
A Family of High-Performance Matrix Multiplication Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
A family of high-performance matrix multiplication algorithms
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Toward scalable matrix multiply on multithreaded architectures
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Matrix multiplication is an essential building block of many linear algebra operations and applications. This paper presents parallel algorithms with shared A or B matrix in the memory for the special massively multithreaded Fiteng1000 processor. We discuss the implementations of parallel matrix multiplication algorithms on the multi-core processor with many threads. To gain better performance, it is important to choose the 2D thread spatial topography, the memory layer for the placement and the sizes of the matrices. Parallel codes using C and assembly language under OpenMP parallel programming environment are designed. Performance results on Fiteng1000 processor show that the algorithms have well good parallel performance and achieve near-peak performance.