Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules
IEEE Transactions on Computers
An Optimal Fault-Tolerant Design Approach for Array Processors
Proceedings of the 1994 International Conference on Parallel and Distributed Systems
Massive parallel processing for matrix multiplication: a systolic approach
Highly parallel computaions
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
Determining objective functions in systolic array designs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules
Journal of Parallel and Distributed Computing
A class of fault-tolerant systolic arrays for matrix multiplication
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.09 |
An approach to design fault-tolerant hexagonal systolic array (SA) for multiplication of rectangular matrices is described. The approach comprises three steps. First, redundancies are introduced at the computational level by deriving three equivalent algorithms but with disjoint index spaces. Second, we perform the accommodation of index spaces to the projection direction to obtain hexagonal SA with optimal number of processing elements (PE) for a given problem size. Finally, we perform mapping of the accommodated index spaces into fault-tolerant systolic array using valid transformation matrix. As a result, we obtain SA with optimal number of PEs which performs fault-tolerant matrix multiplication. In the case of square matrices of order N x N, this array comprises N^2 + 2N PEs with active computation time t"c = 5N - 4 time units. Fault tolerance is achieved through triplicated computation of the same problem instance and majority voting. We have proposed two hardware solutions for the voting process: one when voting is performed at the end of the computation, i.e., at the output of the SA, and the other where voting is performed after each computational step. With the proposed method, any single transient or permanent fault can be detected and corrected. Experimental results show that with the proposed schemes a lot of multiple error patterns can be tolerated, also.