Expressing Boolean cube matrix algorithms in shared memory primitives

  • Authors:
  • S. L. Johnsson;C-T. Ho

  • Affiliations:
  • Department of Computer Science, Yale University, New Haven, CT;Department of Computer Science, Yale University, New Haven, CT

  • Venue:
  • C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

The multiplication of (large) matrices allocated evenly on Boolean cube configured multiprocessors poses several interesting trade-offs with respect to communication time, processor utilization, and storage requirement. In [7] we investigated several algorithms for different degrees of parallelization, and showed how the choice of algorithm with respect to performance depends on the matrix shape, and the multiprocessor parameters, and how processors should be allocated optimally to the different loops.In this paper the focus is on expressing the algorithms in shared memory type primitives. We assume that all processors share the same global address space, and present communication primitives both for nearest-neighbor communication, and global operations such as broadcasting from one processor to a set of processors, the reverse operation of plus-reduction, and matrix transposition (dimension permutation). We consider both the case where communication is restricted to one processor port at a time, or concurrent communication on all processor ports. The communication algorithms are provably optimal within a factor of two. We describe both constant storage algorithms, and algorithms with reduced communication time, but a storage need proportional to the number of processors and the matrix sizes (for a one-dimensional partitioning of the matrices).