Transitive closure and related semiring properties via eliminants
Theoretical Computer Science
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Design of the IBM RISC System/6000 floating-point execution unit
IBM Journal of Research and Development
Using Strassen's algorithm to accelerate the solution of linear systems
The Journal of Supercomputing
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Communications of the ACM - Special issue on computer architecture
MPI: The Complete Reference
CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
Combining building blocks for parallel multi-level matrix multiplication
Parallel Computing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Hi-index | 0.00 |
Recent advances in computing allow taking new look at matrix multiplication, where the key ideas are: decreasing interest in recursion, development of processors with thousands (potentially millions) of processing units, and influences from the Algebraic Path Problems. In this context, we propose a generalized matrix-matrix multiply-add (MMA) operation and illustrate its usability. Furthermore, we elaborate the interrelation between this generalization and the BLAS standard.