Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Toward GPU accelerated topology optimization on unstructured meshes
Structural and Multidisciplinary Optimization
Hi-index | 0.00 |
In this paper we evaluate the performance of several implementations of the routines in BLAS involving band matrices. The results on two different platforms show that not enough attention has been paid to the efficient implementation of these operations. We also present implementations for two level 3 BLAS-like operations that are not included in the current specification of BLAS: the product of a band matrix times a general matrix and the product of two band matrices.