Specialized N-modular redundant processors in large-scale distributed systems

  • Authors:
  • I-Ling Yen

  • Affiliations:
  • -

  • Venue:
  • SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computers are being used to achieve increasingly sophisticated control for large and complex systems. Many of these systems require a large shared state-space or database. Thus, handling real-time concurrent accesses to a shared database is an essential feature for modern fault-tolerant systems. Many fault-tolerant systems have been implemented for uniformly tolerating various types of failures, such as MAFT (Multicomputer Architecture for Fault Tolerance), FTP (Fault-Tolerant Processor), FTPP (Fault-Tolerant Parallel Processors) and Delta-4. However, most of these either lack the notion of a shared state-space or do not efficiently support parallel tasks that concurrently access a shared state-space. We use a processor-specialization approach to increase the effectiveness of replication and, consequently, achieve cost-effective fault tolerance in such systems. The SNMR (specialized N-modular redundancy) protocol has been developed based on these concepts. Compared to many existing Byzantine-resilient systems, the SNMR approach incurs less overhead and can be easily parameterized to fit various fault models.