On Fault-Tolerant Structure, Distributed Fault-Diagnosis, Reconfiguration, and Recovery of the Array Processors

Authors:
S. H. Hosseini
Affiliations:
Univ. of Wisconsin–Milwaukee, Milwaukee
Venue:
IEEE Transactions on Computers
Year:
1989

Citing 12
Cited 5

Synchronized Distributed Termination

IEEE Transactions on Software Engineering
On yield, fault distributions, and clustering of particles

IBM Journal of Research and Development
Fault-Tolerance Considerations in Large, Multiple-Processor Systems

Computer
Fault Tolerance Techniques for Array Structures Used in Supercomputing

Computer
How to prevent circuit zapping

IEEE Spectrum
The CMU warp processor

Supercomputers: algorithms, architectures, and scientific computation
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Architecture of the PSC-a programmable systolic chip

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Fault-tolerant wafer-scale architectures for VLSI

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
A reconfigurable and fault-tolerant VLSI multiprocessor array

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Distributed fault-tolerance for large multiprocessor systems

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture

Distributed Diagnosis Algorithms for Regular Interconnected Structures

IEEE Transactions on Computers
Efficient Distributed Algorithms for Self Testing of Multiple Processor Systems

IEEE Transactions on Computers
Reconfiguring Fault-Tolerant Two-Dimensional Array Architectures

IEEE Micro
Fault-Tolerant Processor Arrays Using Additional Bypass Linking Allocated by Graph-Node Coloring

IEEE Transactions on Computers
Concurrent Error Detection and Correction in Real-Time Systolic Sorting Arrays

IEEE Transactions on Computers

Quantified Score

Hi-index	14.99

Visualization

Abstract

A study is made of the design of fault-tolerant array processors. It is shown how hardware redundancy can be used in the existing structures in order to make them capable of withstanding the failure of some of the array links and processors. Distributed fault-tolerance schemes are introduced for the diagnosis of the faulty elements, reconfiguration, and recovery of the array. Fault tolerance is maintained by the cooperation of processors in a decentralized form of control without the participation of any type of hardcore or fault-free central controller such as a host computer. Time redundancy is utilized by assigning the functions of the failed processors to fault-free processors.