The connection machine
Communication effect basic linear algebra computations on hypercube architectures
Journal of Parallel and Distributed Computing
Data parallel programming and basic linear algebra subroutines
Mathematical aspects of scientific software
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Combinatorial Algorithms: Theory and Practice
Combinatorial Algorithms: Theory and Practice
HPFBench: a high performance Fortran benchmark suite
ACM Transactions on Mathematical Software (TOMS)
An Evaluation of High Performance Fortran Compilers Using the HPFBench Benchmark Suite
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
An object-oriented parallel programming language for distributed-memory parallel computing platforms
Science of Computer Programming
Hi-index | 0.00 |
A data parallel implementation of the multiplication of matrices of arbitrary shapes and sizes is presented. A systolic algorithm based on a rectangular processor layout is used by the implementation. All processors contain submatrices of the same size for a given operand. Matrix-vector multiplication is used as a primitive for local matrix-matrix multiplication in the Connection Machine system CM-2 implementation. The peak performance of the local matrix-matrix multiplication is in excess of 20 Gflops s-1. The overall algorithm including all required data motion has a peak performance of 5.8 Gflops s-1.