Distributed Peer-to-Peer Control in Harness

  • Authors:
  • C. Engelmann;Stephen Scott;G. A. Geist, II

  • Affiliations:
  • -;-;-

  • Venue:
  • ICCS '02 Proceedings of the International Conference on Computational Science-Part II
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Harness is an adaptable fault-tolerant virtual machine environment for next-generation heterogeneous distributed computing developed as a follow on to PVM. It additionally enables the assembly of applications from plug-ins and provides fault-tolerance. This work describes the distributed control, which manages global state replication to ensure a high-availability of service. Group communication services achieve an agreement on an initial global state and a linear history of global state changes at all members of the distributed virtual machine. This global state is replicated to all members to easily recover from single, multiple and cascaded faults. A peer-to-peer ring network architecture and tunable multi-point failure conditions provide heterogeneity and scalability. Finally, the integration of the distributed control into the multi-threaded kernel architecture of Harness offers a fault-tolerant global state database service for plug-ins and applications.