Fault Tolerance in the WebCom Metacomputer

  • Authors:
  • Affiliations:
  • Venue:
  • ICPPW '01 Proceedings of the 2001 International Conference on Parallel Processing Workshops
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: This paper addresses fault tolerance in the WebCom metacomputer. WebCom's computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the affect of a fault to its immediate neighbourhood.