Shepherd: node monitors for fault-tolerant distributed process execution in OSIRIS

  • Authors:
  • Diego Milano;Nenad Stojnić

  • Affiliations:
  • University of Basel, Switzerland;University of Basel, Switzerland

  • Venue:
  • Proceedings of the 5th International Workshop on Enhanced Web Service Technologies
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

OSIRIS is a middleware for the composition and orchestration of distributed web services that follows a P2P decentralized approach to process execution, providing already some degree of resilience to faults and high performance in large-scale computational clusters. In this paper, we present on-going work aimed at improving OSIRIS' fault tolerance capabilities. We introduce in OSIRIS new architectural elements for the maintenance of a virtual stable storage and the monitoring of activities of service instances, together with algorithms that allow execution to survive also failures that the system is currently not able to cope with.