Parallel blocked algorithm for solving the algebraic path problem on a matrix processor

  • Authors:
  • Akihito Takahashi;Stanislav Sedukhin

  • Affiliations:
  • Graduate School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu City, Fukushima, Japan;Graduate School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu City, Fukushima, Japan

  • Venue:
  • HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a parallel blocked algorithm for the algebraic path problem (APP). It is known that the complexity of the APP is the same as that of the classical matrix-matrix multiplication; however, solving the APP takes much more running time because of its unique data dependencies that limits data reuse drastically. We examine a parallel implementation of a blocked algorithm for the APP on the one-chip Intrinsity FastMATH adaptive processor, which consists of a scalar MIPS processor extended with a SIMD matrix coprocessor. The matrix coprocessor supports native matrix instructions on an array of 4 × 4 processing elements. Implementing with matrix instructions requires us to transform algorithms in terms of matrix-matrix operations. Conventional vectorization for SIMD vector processing deals with only the innermost loop; however, on the FastMATH processor, we need to vectorize two or three nested loops in order to convert the loops to equivalent one matrix operation. Our experimental results show a peak performance of 9.27 GOPS and high usage rates of matrix instructions for solving the APP. Findings from our experimental results indicate that the SIMD matrix extension to (super)scalar processor would be very useful for fast solution of many matrix-formulated problems.