Experimental Analysis of a Gossip-Based Service for Scalable, Distributed Failure Detection and Consensus

Authors:
Krishnakanth Sistla;Alan D. George;Robert W. Todd
Affiliations:
High-performance Computing and Simulation (HCS) Research Laboratory, Department of Electrical and Computer Engineering, University of Florida, P.O. Box 116200, Gainesville, FL 32611-6200, USA;High-performance Computing and Simulation (HCS) Research Laboratory, Department of Electrical and Computer Engineering, University of Florida, P.O. Box 116200, Gainesville, FL 32611-6200, USA;High-performance Computing and Simulation (HCS) Research Laboratory, Department of Electrical and Computer Engineering, University of Florida, P.O. Box 116200, Gainesville, FL 32611-6200, USA
Venue:
Cluster Computing
Year:
2003

Citing 0
Cited 3

Failure Detection and Membership Management in Grid Environments

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems

Cluster Computing
A Self-Organizing Super-Peer Overlay with a Chord Core for Desktop Grids

IWSOS '08 Proceedings of the 3rd International Workshop on Self-Organizing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gossip protocols and services provide a means by which failures can be detected in large, distributed systems in an asynchronous manner without the limits associated with reliable multicasting for group communications. Extending the gossip protocol such that a system reaches consensus on detected faults can be performed via a flat structure, or it can be hierarchically distributed across cooperating layers of nodes. In this paper, the performance of gossip services employing flat and hierarchical schemes is analyzed on an experimental testbed in terms of consensus time, resource utilization and scalability. Performance associated with a hierarchically arranged gossip scheme is analyzed with varying group sizes and is shown to scale well. Resource utilization of the gossip-style failure detection and consensus service is measured in terms of network bandwidth utilization and CPU utilization. Analytical models are developed for resource utilization and performance projections are made for large system sizes.