Achieving Scalable Cluster System Analysis and Management with a Gossip-Based Network Service

  • Authors:
  • D. E. Collins;A. D. George;R. A. Quander

  • Affiliations:
  • -;-;-

  • Venue:
  • LCN '01 Proceedings of the 26th Annual IEEE Conference on Local Computer Networks
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clusters of workstations are increasingly used forapplications requiring high levels of both performanceand reliability.Certain fundamental services are highlydesirable to achieve these twin goals of network-basedcluster system analysis and management.Among theseservices is the ability to detect network and node failuresand the capability to efficiently determine computer andnetwork load levels.Furthermore, the ability to allow forthe distribution of administrative directives is alsointegral to the goal of cluster management.This paperpresents a scalable approach to providing these vitalsupport capabilities for distributed computing integratedinto a cluster management system.Previous approachesto cluster management have suffered from problems ofscalability and the inability to properly supportheterogeneous systems in a non-proprietary fashion.Thiscluster management system employs gossip techniques toaddress the problem of scalability in network-basedsystem management.The results of two case studies showthat the cluster management system is scalable and haslittle adverse impact on the performance of sequentialand parallel applications running on the managed system.