Computing the Singular-Value Decomposition on the ILLIAC IV
ACM Transactions on Mathematical Software (TOMS)
Data-Driven and Demand-Driven Computer Architecture
ACM Computing Surveys (CSUR)
Solution of the matrix equation AX + XB = C [F4]
Communications of the ACM
A Systolic Architecture for Almost Linear-Time Solution of the Symmetric Eigenvalue Problem
A Systolic Architecture for Almost Linear-Time Solution of the Symmetric Eigenvalue Problem
Computation of the Singular Value Decomposition Using Mesh-Connected Processors
Computation of the Singular Value Decomposition Using Mesh-Connected Processors
From determinacy to systaltic arrays
IEEE Transactions on Computers
A cache-based message passing scheme for a shared-bus multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Parallel placement of parallel processes
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Unifying and Optimizing Parallel Linear Algebra Algorithms
IEEE Transactions on Parallel and Distributed Systems
Performance prediction of parallel systems with scalable specifications—methodology and case study
ACM SIGMETRICS Performance Evaluation Review
The Journal of Supercomputing
Parallel Cyclic Wavefront Algorithms for Solving Semidefinite Lyapunov Equations
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Scalable matrix computations on large scale-free graphs using 2D graph partitioning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 48.23 |
In this article we develop some algorithms and tools for solving matrix problems on parallel processing computers. Operations are synchronized through data-flow alone, which makes global synchronization unnecessary and enables the algorithms to be implemented on machines with very simple operating systems and communication protocols. As examples, we present algorithms that form the main modules for solving Liapounov matrix equations. We compare this approach to wave front array processors and systolic arrays, and note its advantages in handling missized problems, in evaluating variations of algorithms or architectures, in moving algorithms from system to system, and in debugging parallel algorithms on sequential machines.