An efficient management and automatic failover on a large-scale cluster monitoring system

Authors:
Choon Seo Park;Song-Woo Sok;Jin-Hwan Jeong;Yong-Ju Lee;Chang Soo Kim;Ok-Gee Min;Hag-Young Kim;Jae Soo Yoo
Affiliations:
Internet Platform Research Department, Electronics and Telecommunication Research Institute, Daejeon, Korea;Internet Platform Research Department, Electronics and Telecommunication Research Institute, Daejeon, Korea;Internet Platform Research Department, Electronics and Telecommunication Research Institute, Daejeon, Korea;Internet Platform Research Department, Electronics and Telecommunication Research Institute, Daejeon, Korea;Internet Platform Research Department, Electronics and Telecommunication Research Institute, Daejeon, Korea;Internet Platform Research Department, Electronics and Telecommunication Research Institute, Daejeon, Korea;Internet Platform Research Department, Electronics and Telecommunication Research Institute, Daejeon, Korea;Division of Information and Communication Eng., Chungbuk National University, Cheongju, Korea
Venue:
ICOSSSE '09 Proceedings of the 8th WSEAS international conference on System science and simulation in engineering
Year:
2009

Citing 2
Cited 0

Supermon: A High-Speed Cluster Monitoring System

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose method that an efficient technique for automatic configuration of large cluster monitoring system and automatic failover on failure commodity server nodes. Detecting failure nodes and making a complete failover for failure nodes leads to reduce cost of administering nodes and keep high availability of numerous commodity nodes. Making a group by subnet unit, there are one Group Master and many leaf nodes on a group. After Leaf nodes collect monitoring data and send them to Group Master. Group Master node saves monitoring data which is received by leaf nodes on DB server node. When there are some crashes on leaf nodes, the leaf node is deleted by Cluster Master. If crash occurs Group Master, Group Master node is deleted by Cluster Master and new Group Master is assigned among leaf nodes which are active state by Cluster Master. According to automatic failover for failure nodes, we can keep high availability on large-scale cluster systems.