Matrix multiplication on the connection machine

  • Authors:
  • S. L. Johnsson;T. Harris;K. K. Mathur

  • Affiliations:
  • Thinking Machines Corp., 245 First Street, Cambridge, MA and Department of Computer Science, Yale University, New Haven, CT;Thinking Machines Corp., 245 First Street, Cambridge, MA;Thinking Machines Corp., 245 First Street, Cambridge, MA

  • Venue:
  • Proceedings of the 1989 ACM/IEEE conference on Supercomputing
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data parallel implementation of the multiplication of matrices of arbitrary shapes and sizes is presented. A systolic algorithm based on a rectangular processor layout is used by the implementation. All processors contain submatrices of the same size for a given operand. Matrix-vector multiplication is used as a primitive for local matrix-matrix multiplication in the Connection Machine system CM-2 implementation. The peak performance of the local matrix-matrix multiplication is in excess of 20 Gflops s-1. The overall algorithm including all required data motion has a peak performance of 5.8 Gflops s-1.