System's Availability Maximization Through Preventive Rejuvenation

  • Authors:
  • Y. Langer;A. Urmanov

  • Affiliations:
  • Member, IEEE, Sun Microsystems, USA. yuri_langer@sun.com;-

  • Venue:
  • ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Preventive system maintenance is one of the important measures to assure the key elements of autonomic computing such as adaptation, healing and protection. However, existing preventive maintenance methods do not meet the necessary requirements of their application into practice. In this paper, we propose a new preventive maintenance approach, termed preventive rejuvenation, that makes up some deficiencies of existing methods. System's physical parameters and software metrics are continuously collected and processed using continuous system telemetry. These data are used to estimate the current state of the system and its components during system operation. The optimal rejuvenation rule for a degrading system is formulated as a subset of the system states in which the system should be subjected to certain preventive actions so that the system's availability is maximized (or other specific goal is achieved). The uncertainty in the data, which originates from the experimental measurements, may invalidate the obtained optimal rejuvenation rule. To deal with this uncertainty, we derived acceptable bounds for variations of the obtained data for a given confidence level. On the basis of the derived bounds, requirements and rules for data gathering and processing are generated. Examples of application of the preventive rejuvenation approach to various components of a computer server are demonstrated.