Parallel implementation of the algebraic path problem
Proc. of the conference on algorithms and hardware for parallel processing on CONPAR 86
Minimum-cost spanning tree as a path-finding problem
Information Processing Letters
Communications of the ACM
A Blocked All-Pairs Shortest-Path Algorithm
SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
Cache-Friendly Implementations of Transitive Closure
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
This paper presents a parallel blocked algorithm for the algebraic path problem (APP). It is known that the complexity of the APP is the same as that of the classical matrix-matrix multiplication; however, solving the APP takes much more running time because of its unique data dependencies that limits data reuse drastically. We examine a parallel implementation of a blocked algorithm for the APP on the one-chip Intrinsity FastMATH adaptive processor, which consists of a scalar MIPS processor extended with a SIMD matrix coprocessor. The matrix coprocessor supports native matrix instructions on an array of 4 × 4 processing elements. Implementing with matrix instructions requires us to transform algorithms in terms of matrix-matrix operations. Conventional vectorization for SIMD vector processing deals with only the innermost loop; however, on the FastMATH processor, we need to vectorize two or three nested loops in order to convert the loops to equivalent one matrix operation. Our experimental results show a peak performance of 9.27 GOPS and high usage rates of matrix instructions for solving the APP. Findings from our experimental results indicate that the SIMD matrix extension to (super)scalar processor would be very useful for fast solution of many matrix-formulated problems.