Numerical recipes: the art of scientific computing
Numerical recipes: the art of scientific computing
Matrix multiplication via arithmetic progressions
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory
SIAM Journal on Scientific and Statistical Computing
Further comparisons of direct methods for computing stationary distributions of Markov chains
SIAM Journal on Algebraic and Discrete Methods
Algorithmics: theory & practice
Algorithmics: theory & practice
Extra high speed matrix multiplication on the Cray-2
SIAM Journal on Scientific and Statistical Computing
Algorithms (2nd ed.)
The accuracy of solutions to triangular systems
SIAM Journal on Numerical Analysis
ACM Transactions on Mathematical Software (TOMS)
Fast polar decomposition of an arbitrary matrix
SIAM Journal on Scientific and Statistical Computing
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Matrix computations (3rd ed.)
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Algorithms for matrix multiplication
Algorithms for matrix multiplication
Multilinear algebra and parallel programming
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Stability of block algorithms with fast level-3 BLAS
ACM Transactions on Mathematical Software (TOMS)
Variants of matrix-matrix multiplication for Fortran-90
ACM SIGNUM Newsletter
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
Implementation of Strassen's algorithm for matrix multiplication
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
High performance first principles method for complex magnetic properties
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Blocking Techniques in Numerical Software
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
The aggregation and cancellation techniques as a practical tool for faster matrix multiplication
Theoretical Computer Science - Algebraic and numerical algorithm
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Adaptive Strassen's matrix multiplication
Proceedings of the 21st annual international conference on Supercomputing
Dense Linear Algebra over Word-Size Prime Fields: the FFLAS and FFPACK Packages
ACM Transactions on Mathematical Software (TOMS)
Misleading Performance Reporting in the Supercomputing Field
Scientific Programming
Adaptive Winograd's matrix multiplications
ACM Transactions on Mathematical Software (TOMS)
Generalized matrix inversion is not harder than matrix multiplication
Journal of Computational and Applied Mathematics
Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Using recursion to boost ATLAS's performance
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Optimized dense matrix multiplication on a many-core architecture
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Stability of block LU factorization for block tridiagonal block H-matrices
Journal of Computational and Applied Mathematics
Fast matrix decomposition in F2
Journal of Computational and Applied Mathematics
The Journal of Supercomputing
Hi-index | 0.01 |
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high-performance computers. We describe algorithms for the BLAS3 operations that are asymptotically faster than the conventional ones. These algorithms are based on Strassen's method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these “fast BLAS3.” Error bounds are given and their significance is explained and illustrated with the aid of numerical experiments. Our conclusion is that the fast BLAS3, although not as strongly stable as conventional implementations, are stable enough to merit careful consideration in many applications.