A group membership service for large-scale grids

Authors:
Fernando Castor Filho;Augusta Marques;Raphael Y. de Camargo;Fabio Kon
Affiliations:
University of Pernambuco;University of Pernambuco;University of São Paulo;University of São Paulo
Venue:
Proceedings of the 6th international workshop on Middleware for grid computing
Year:
2008

Citing 19
Cited 1

Epidemic algorithms for replicated database maintenance

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
The process group approach to reliable distributed computing

Communications of the ACM
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
Lua—an extensible extension language

Software—Practice & Experience
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
On the Quality of Service of Failure Detectors

IEEE Transactions on Computers
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Legion: An Operating System for Wide-Area Computing

Legion: An Operating System for Wide-Area Computing
Failure Detection and Membership Management in Grid Environments

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
The " Accrual Failure Detector

SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
InteGrade object-oriented Grid middleware leveraging the idle computing power of desktop machines: Research Articles

Concurrency and Computation: Practice & Experience - Middleware for Grid Computing
ALTER: Adaptive Failure Detection Services for Grids

SCC '05 Proceedings of the 2005 IEEE International Conference on Services Computing - Volume 01
A new adaptive accrual failure detector for dependable distributed systems

Proceedings of the 2007 ACM symposium on Applied computing
HyParView: A Membership Protocol for Reliable Gossip-Based Broadcast

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Epidemic Information Dissemination in Distributed Systems

Computer
A gossip-style failure detection service

Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing

Application execution management on the InteGrade opportunistic grid middleware

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a decentralized group membership service that can be incorporated into existing grid middleware to make it more reliable. This service includes a flexible failure detector that adapts dynamically to changing network conditions and can be configured with a number of failure recovery strategies. Moreover, it disseminates information about membership changes (new processes, failures, etc.) in a scalable and efficient manner. We conducted a preliminary evaluation of the proposed service by simulating a grid with up to 140 nodes distributed across three domains separated by a wide-area network. This evaluation showed that the proposed service performs well both in the absence and in the presence of process failures.