Specification-enhanced policies for automated management of changes in IT systems

  • Authors:
  • Chetan Shankar;Vanish Talwar;Subu Iyer;Yuan Chen;Dejan Milojicić;Roy Campbell

  • Affiliations:
  • University of Illinois at Urbana-Champaign;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;University of Illinois at Urbana-Champaign

  • Venue:
  • LISA '06 Proceedings of the 20th conference on Large Installation System Administration
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Enterprise and grid computing systems are complex and subject to a broad range of changes such as configuration updates, failures, and performance degradations. These changes affect infrastructure elements such as computation and storage nodes, applications, and system management elements such as monitoring infrastructures. Today's best practices in use by system administrators to manage these changes are manual and ad-hoc. In large complex installations, this would lead to high operational costs, broken closed loop automation, and reduced agility. Providing tools and mechanisms to administrators that automate the reaction to these changes is highly desirable and is an active research area. Policy-based management using Event-Condition-Action (ECA) rules is a well-known approach for such automated change management where management actions are executed when specified event-conditions are observed. In complex systems, the interdependence of components generates multiple events when a single change happens causing multiple rules to be triggered. The order of execution of rule actions determines the system behavior necessitating reasoning about execution order. ECA rules do not contain explicit action specifications needed for reasoning and are therefore unsuited for specifying management rules. In this paper, we propose a specification-enhanced ECA model called Event-Condition-Precondition-Action-Postcondition (ECPAP) for designing adaptation rules. ECPAP rules contain action specifications in first order predicate logic enabling us to develop reasoning algorithms to determine enforcement order of multiple rules. The enforcement order is represented as a Boolean Interpreted Petri Net workflow. We introduce a new notion called enforcement semantics that provides guarantees about rule ordering. We have built an adaptation framework using ECPAP model and have demonstrated it for automated change management of Ganglia and HP OpenView monitoring systems. The evaluation of the framework illustrates the significance of the ECPAP model and demonstrates its applicability for managing complex IT environments.