Fault-tolerant systems with concurrent error-locating capability

  • Authors:
  • JianHui Jiang;YingHua Min;ChengLian Peng

  • Affiliations:
  • Department of Computer Science and Technology, Tongji University, Shanghai 200092, P.R. China and Department of Computing and Information Technology, Fudan University, Shanghai 200433, P.R. China;Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, P.R. China;Department of Computing and Information Technology, Fudan University, Shanghai 200433, P.R. China

  • Venue:
  • Journal of Computer Science and Technology
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault-tolerant systems have found wide applications in military, industrial and commercial areas. Most of these systems are constructed by multiple-modular redundancy or error control coding techniques. They need some fault-tolerant specific components (such as voter, switcher, encoder, or decoder) to implement error-detecting or error-correcting functions. However, the problem of error detection, location or correction for fault-tolerance specific components themselves has not been solved properly so far. Thus, the dependability of a whole fault-tolerant system will be greatly affected. This paper presents a theory of robust fault-masking digital circuits for characterizing fault-tolerant systems with the ability of concurrent error location and a new scheme of dual-modular redundant systems with partially robust fault-masking property. A basic robust fault-masking circuit is composed of a basic functional circuit and an error-locating corrector. Such a circuit not only has the ability of concurrent error correction, but also has the ability of concurrent error location. According to this circuit model, for a partially robust fault-masking dual-modular redundant system, two redundant modules based on alternating-complementary logic consist of the basic functional circuit. An error-correction specific circuit named as alternating-complementary corrector is used as the error-locating corrector. The performance (such as hardware complexity, time delay) of the scheme is analyzed.