The general matrix multiply-add operation on 2D torus

  • Authors:
  • Ahmed S. Zekri;Stanislav G. Sedukhin

  • Affiliations:
  • The Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City, Fukushima, Japan;The Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City, Fukushima, Japan

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, the index space of the (n×n)-matrix multiply-add problem C = C +AċB is represented as a 3D n×n×n torus. All possible time-scheduling functions to activate the computation and data rolling inside the 3D torus index space are determined. To maximize efficiency when solving a single problem, we mapped the computations into the 2D n×n toroidal array processor. All optimal 2D data allocations that solve the problem in n multiply-add-roll steps are obtained. The well known Cannon's algorithm is one of the resulting allocations. We used the optimal data allocations to describe all variants of the GEMM operation on the 2D toroidal array processor. By controling the data movement, the transposition operation is avoided in 75% of the GEMM variants. However, only one matrix transpose is needed for the remaining 25%. Ultimately, we described four versions of the GEMM operation covering the possible layouts of the initially loaded data into the array processor.