Accurate floating-point summation
Communications of the ACM
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
C++ Toolbox for Verified Scientific Computing I: Basic Numerical Problems
C++ Toolbox for Verified Scientific Computing I: Basic Numerical Problems
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
Computer Arithmetic in Theory and Practice
Computer Arithmetic in Theory and Practice
C-XSC: A C++ Class Library for Extended Scientific Computing
C-XSC: A C++ Class Library for Extended Scientific Computing
LAPACK Working Note 58: ``The Design of Linear Algebra Libraries for High Performance Computers
LAPACK Working Note 58: ``The Design of Linear Algebra Libraries for High Performance Computers
Parallel Matrix Distributions: Have we been doing it all right?
Parallel Matrix Distributions: Have we been doing it all right?
Error bounds from extra-precise iterative refinement
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Numerische Mathematik
Hi-index | 0.00 |
In parallel computing the data distribution may have a significant impact in the application performance and accuracy. These effects can be observed using the parallel matrix-vector multiplication routine from PBLAS with different grid configurations in data distribution. Matrix-vector multiplication is an especially important operation once it is widely used in numerical simulation (e.g., iterative solvers for linear systems of equations). This paper presents a mathematical background of error propagation in elementary operations and proposes benchmarks to show how different grid configurations based on the two dimensional cyclic block distribution impacts accuracy and performance using parallel matrix-vector operations. The experimental results validate the theoretical findings.