A Case Study for Fault Tolerance Oriented Programming in Multi-core Architecture

  • Authors:
  • Lu Yang;Zhanqi Cui;Xuandong Li

  • Affiliations:
  • -;-;-

  • Venue:
  • HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The multi-core architecture brings more and more challenges and means to common software developers. Reliable software system design approaches can give a high confidence that long-running online software systems run correctly. But anyway these approaches will certainly cause the loss of the efficiency. We found that the multi-core architecture is a quite suitable platform to support reliable software system design and can make the cost acceptable because of its advantages of the parallel performance and prevalence. In this paper we make use of the multi-core architecture to support software fault tolerance. This approach will make the integration of software fault tolerance and the multi-core architecture as a common design choice. According to the idea of software fault tolerance, for some key software units in a system we can develop N separate versions of them with equivalent functionalities. Each version is developed independently by an isolated group to prevent identical faults among versions. All implemented versions run separately from same initial conditions and inputs. Outputs of all redundant versions are submitted to a decision module that determines a single result from multiple results as the correct output. In this paper, we give a case study to show that with the multi-core architecture, the redundant versions of a key software unit can run in parallel on different cores to improve the efficiency.