ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
SIAM Journal on Matrix Analysis and Applications
Solution of the Sylvester matrix equation AXBT + CXDT = E
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
On computing condition numbers for the nonsymmetric eigenproblem
ACM Transactions on Mathematical Software (TOMS)
A Perturbation Analysis of the Generalized Sylvester Equation $(AR - LB, DR - LE) = (C, F)$
SIAM Journal on Matrix Analysis and Applications
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Pthreads programming
Matrix computations (3rd ed.)
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues
ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form
ACM Transactions on Mathematical Software (TOMS)
Solution of the matrix equation AX + XB = C [F4]
Communications of the ACM
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Recursive Blocked Data Formats and BLAS's for Dense Linear Algebra Algorithms
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Parallel Algorithms for Triangular Sylvester Equations: Design, Scheduling and Saclability Issues
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Minimal-storage high-performance Cholesky factorization via blocking and recursion
IBM Journal of Research and Development
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Parallel Two-Sided Sylvester-Type Matrix Equation Solvers for SMP Systems Using Recursive Blocking
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Block variants of Hammarling's method for solving Lyapunov equations
ACM Transactions on Mathematical Software (TOMS)
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
ACM Transactions on Mathematical Software (TOMS)
Parallel Algorithms for Triangular Periodic Sylvester-Type Matrix Equations
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Algorithm 894: On a block Schur--Parlett algorithm for ϕ-functions based on the sep-inverse estimate
ACM Transactions on Mathematical Software (TOMS)
Parallel lattice Boltzmann method with blocked partitioning
International Journal of Parallel Programming - Special issue on the 19th international symposium on computer architecture and high performance computing (SBAC-PAD 2007)
Prospectus for the next LAPACK and ScaLAPACK libraries
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Recursive blocked algorithms for solving periodic triangular Sylvester-type matrix equations
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
ACM Transactions on Mathematical Software (TOMS)
Parameterized solution to a class of sylvester matrix equations
International Journal of Automation and Computing
Knowledge-based automatic generation of partitioned matrix expressions
CASC'11 Proceedings of the 13th international conference on Computer algebra in scientific computing
JuliusC: a practical approach for the analysis of divide-and-conquer algorithms
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Automatic derivation of linear algebra algorithms with application to control theory
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Blocked schur algorithms for computing the matrix square root
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Application-tailored linear algebra algorithms: A search-based approach
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Triangular matrix equations appear naturally in estimating the condition numbers of matrix equations and different eigenspace computations, including block-diagonalization of matrices and matrix pairs and computation of functions of matrices. To solve a triangular matrix equation is also a major step in the classical Bartels--Stewart method for solving the standard continuous-time Sylvester equation (AX − XB = C). We present novel recursive blocked algorithms for solving one-sided triangular matrix equations, including the continuous-time Sylvester and Lyapunov equations, and a generalized coupled Sylvester equation. The main parts of the computations are performed as level-3 general matrix multiply and add (GEMM) operations. In contrast to explicit standard blocking techniques, our recursive approach leads to an automatic variable blocking that has the potential of matching the memory hierarchies of today's HPC systems. Different implementation issues are discussed, including when to terminate the recursion, the design of new optimized superscalar kernels for solving leaf-node triangular matrix equations efficiently, and how parallelism is utilized in our implementations. Uniprocessor and SMP parallel performance results of our recursive blocked algorithms and corresponding routines in the state-of-the-art libraries LAPACK and SLICOT are presented. The performance improvements of our recursive algorithms are remarkable, including 10-fold speedups compared to standard algorithms.