Modeling of Correlated Failures and Community Error Recovery in Multiversion Software

  • Authors:
  • Victor F. Nicola;Ambuj Goyal

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • IEEE Transactions on Software Engineering
  • Year:
  • 1990

Quantified Score

Hi-index 0.01

Visualization

Abstract

Three aspects of the modeling of multiversion software are considered. First, the beta-binomial distribution is proposed for modeling correlated failures in multiversion software. Second, a combinatorial model for predicting the reliability of a multiversion software configuration is presented. This model can take as inputs failure distributions either from measurements or from a selected distribution (e.g. beta-binomial). Various recovery methods can be incorporated in this model. Third, the effectiveness of the community error recovery method based on checkpointing is investigated. This method appears to be effective only when the failure behaviors of program versions are lightly correlated. Two different types of checkpoint failure are also considered: an omission failure where the correct output is recognized at a checkpoint but the checkpoint fails to correct the wrong outputs and a destructive failure where the good versions get corrupted at a checkpoint.