SIAM Journal on Matrix Analysis and Applications
Parallelizing the QR algorithm for the unsymmetric algebraic eigenvalue problem: myths and reality
SIAM Journal on Scientific Computing
ScaLAPACK user's guide
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Solution of the matrix equation AX + XB = C [F4]
Communications of the ACM
Parallel Algorithms for Triangular Sylvester Equations: Design, Scheduling and Saclability Issues
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Parallel Implementation of the Nonsymmetric QR Algorithm forDistributed Memory Architectures
Parallel Implementation of the Nonsymmetric QR Algorithm forDistributed Memory Architectures
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.01 |
Recent ScaLAPACK-style implementations of the Bartels-Stewart method and the iterative matrix-sign-function-based method for solving continuous-time Sylvester matrix equations are evaluated with respect to generality of use, execution time and accuracy of computed results. The test problems include well-conditioned as well as ill-conditioned Sylvester equations. A method is considered more general if it can effectively solve a larger set of problems. Ill-conditioning is measured with respect to the separation of the two matrices in the Sylvester operator. Experiments carried out on two different distributed memory machines show that the parallel explicitly blocked Bartels-Stewart algorithm can solve more general problems and delivers far more accuracy for ill-conditioned problems. It is also up to four times faster for large enough problems on the most balanced parallel platform (IBM SP), while the parallel iterative algorithm is almost always the fastest of the two on the less balanced platform (HPC2N Linux Super Cluster).