An Efficient Algorithm-Based Fault Tolerance Design Using the Weighted Data-Check Relationship

Authors:
Hee Yong Youn;Choong Gun Oh;Hyunseung Choo;Jin-Wook Chung;Dongman Lee
Affiliations:
Sungkyunkwan Univ., Suwon, Korea;Nsystems Communications, San Diego, CA;Sungkyunkwan Univ., Suwon, Korea;Sungkyunkwan Univ., Suwon, Korea;Information and Communications Univ., Taejon, Korea
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 19
Cited 0

Coding and information theory (2nd ed.)

Coding and information theory (2nd ed.)
Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems

IEEE Transactions on Computers - The MIT Press scientific computation series
VLSI array processors

VLSI array processors
An analysis of algorithm-based fault tolerance techniques

Journal of Parallel and Distributed Computing
Fault-Tolerant Matrix Triangularizations on Systolic Arrays

IEEE Transactions on Computers
Design & analysis of fault tolerant digital systems

Design & analysis of fault tolerant digital systems
Tradeoffs in the Design of Efficient Algorithm-Based Error Detection Schemes for Hypercube Multiprocessors

IEEE Transactions on Software Engineering
Matrix Computations on Systolic-Type Meshes: An Introduction to the Multimesh Graph Method

Computer
Real-Number Codes for Fault-Tolerant Matrix Operations on Processor Arrays

IEEE Transactions on Computers
Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor

IEEE Transactions on Computers
A Modular Fault-Tolerant Binary Tree Architecture with Short Links

IEEE Transactions on Computers
Algorithm-based fault tolerance for matrix inversion with maximum pivoting

Journal of Parallel and Distributed Computing
Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Analysis and Randomized Design of Algorithm-Based Fault Tolerant Multiprocessor Systems Under an Extended Model

IEEE Transactions on Parallel and Distributed Systems
Generalized Algorithm-Based Fault Tolerance: Error Correction via Kalman Estimation

IEEE Transactions on Computers
A New Algorithm Based on Givens Rotations for Solving Linear Equations on Fault-Tolerant Mesh-Connected Processors

IEEE Transactions on Parallel and Distributed Systems
Improved Bounds for Algorithm-Based Fault Tolerance

IEEE Transactions on Computers
Construction of Check Sets for Algorithm-Based Fault Tolerance

IEEE Transactions on Computers
Design of Algorithm-Based Fault-Tolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	14.98

Visualization

Abstract

VLSI-based processor arrays have been widely used for computation intensive applications such as matrix and graph algorithms. Algorithm-based fault tolerance designs employing various encoding/decoding schemes have been proposed for such systems to effectively tolerate operation time fault. In this paper, we propose an efficient algorithm-based fault tolerance design using the weighted data-check relationship, where the checks are obtained from the weighted data. The relationship is systematically defined as a new $(n,k,N_w)$ Hamming checksum code, where $n$ is the size of the code word, $k$ is the number of information elements in the code word, and $N_w$ is the number of weights employed, respectively. The proposed design with various weights is evaluated in terms of time and hardware overhead as well as overflow probability and round-off error. Two different schemes employing the $(n,k,2)$ and $(n,k,3)$ Hamming checksum code are illustrated using important matrix computations. Comparison with other schemes reveals that the $(n,k,3)$ Hamming checksum scheme is very efficient, while the hardware overhead is small.