Distributed Peer-to-Peer Control in Harness

Authors:
C. Engelmann;Stephen Scott;G. A. Geist, II
Affiliations:
-;-;-
Venue:
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Year:
2002

Citing 8
Cited 6

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Totem: a fault-tolerant multicast group communication system

Communications of the ACM
The Transis approach to high availability cluster communication

Communications of the ACM
Synchronous and asynchronous

Communications of the ACM
An evaluation of flow control in group communication

IEEE/ACM Transactions on Networking (TON)
Fundamentals of fault-tolerant distributed computing in asynchronous environments

ACM Computing Surveys (CSUR)
MPI: The Complete Reference

MPI: The Complete Reference
An Architecture for a Multi-threaded Harness Kernel

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface

An Architecture for a Multi-threaded Harness Kernel

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scalable hierarchical locking for distributed systems

Journal of Parallel and Distributed Computing - Special issue on middleware
A Lightweight Kernel for the Harness Metacomputing Framework

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
MOLAR: adaptive runtime support for high-end computing operating and runtime systems

ACM SIGOPS Operating Systems Review
Scalable, fault tolerant membership for MPI tasks on HPC systems

Proceedings of the 20th annual international conference on Supercomputing
A parallel plug-in programming paradigm

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Harness is an adaptable fault-tolerant virtual machine environment for next-generation heterogeneous distributed computing developed as a follow on to PVM. It additionally enables the assembly of applications from plug-ins and provides fault-tolerance. This work describes the distributed control, which manages global state replication to ensure a high-availability of service. Group communication services achieve an agreement on an initial global state and a linear history of global state changes at all members of the distributed virtual machine. This global state is replicated to all members to easily recover from single, multiple and cascaded faults. A peer-to-peer ring network architecture and tunable multi-point failure conditions provide heterogeneity and scalability. Finally, the integration of the distributed control into the multi-threaded kernel architecture of Harness offers a fault-tolerant global state database service for plug-ins and applications.