Low-cost fault-tolerance protocol for large-scale network monitoring

  • Authors:
  • JinHo Ahn;SungGi Min;YoungIl Choi;ByungSun Lee

  • Affiliations:
  • Dept. of Computer Science, College of Information Science, Kyonggi University, Paldalgu, Suwonsi Kyonggido, Republic of Korea;Dept. of Computer Science & Engineering, Korea University, Seoul, Republic of Korea;Network Technology Lab., Electronics and Telecommunications Research Institute, Taejon, Republic of Korea;Network Technology Lab., Electronics and Telecommunications Research Institute, Taejon, Republic of Korea

  • Venue:
  • ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed hierarchical network monitoring model has been proposed to solve scalability problem of centralized model. In this distributed model, a top-level monitoring manager, called main manager, obtains aggregate management information from mid-level managers, named domain managers, forming a hierarchical structure. However, if some of monitoring managers crash, network elements cannot be continuously and correctly monitored until the managers are repaired. To address this important, but previously unresolved issue, this paper presents a new fault-tolerance protocol for domain managers, named DMFTP, allowing the managers to efficiently utilize their organization structure. Therefore, this protocol can minimize failure detection overhead and the number of live managers affected by each manager node crash. Also, it tolerates concurrent manager failures and, after the failed managers have been repaired, ensures their immediate and consistent recovery.