Fastpath Optimizations for Cluster Recovery in Shared-Disk Systems

  • Authors:
  • Randal Burns

  • Affiliations:
  • Johns Hopkins University

  • Venue:
  • Proceedings of the 2004 ACM/IEEE conference on Supercomputing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the design and implementation of a clustering service for a high-performance, shared-disk file system. The service provides failure detection and recovery, reliableend-to-end messaging, and a centralized and recoverable management interface. We implement novel optimizations in the voting protocol that resolves cluster membership. Optimizations allow clusters to form as quickly as possible without introducing livelock or requiring timeout parameters to be tuned carefully. Our treatment includes performance results that quantify the scalability of the system and measure recovery times.