Refined failure remediation for IT change management systems

  • Authors:
  • Guilherme Sperb Machado;Weverton Luis da Costa Cordeiro;Alan Diego dos Santos;Juliano Wickboldt;Roben Castagna Lunardi;Fabrício Girardi Andreis;Cristiano Bonato Both;Luciano Paschoal Gaspary;Lisandro Zambenedetti Granville;David Trastour;Claudio Bartolini

  • Affiliations:
  • Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;Institute of Informatics, UFRGS, Brazil;HP Laboratories Bristol, UK and HP Laboratories Palo Alto;HP Laboratories Bristol, UK and HP Laboratories Palo Alto

  • Venue:
  • IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to deal with failures in the deployment of IT changes and to always leave IT infrastructures into consistent states, we proposed in a previous work, a solution to automate the generation of rollback plans in IT change management systems. The solution was based on a mechanism that treats Requests for Change (RFC) (or parts of them) as a single atomic transaction. In this work, we extend our previous investigation and present more flexible and fine grained treatment of failures. The paper first presents extensions to our conceptual model in order (i) to give IT operators some flexibility in defining rollback actions, for example, by allowing the rollback plan to not only be a reversed change plan; and (ii) to execute different recovery activities depending on the cause and location of a problem. The paper then focuses on a refined manner to handle and treat failures in change deployments. We follow the ITIL version 3 best practises which suggest that, depending on the RFC context, the human operator can classify activities as reversible or irreversible. Such classification allows change management systems to automatically generate more accurate remediation plans. The proposal takes into account not only a precise way to define how rollback plans will be generated, but also an intuitive method enabling the operator to define compensation activities in order to complete the RFC successfully, even with the occurrence of failures. To prove the concept and technical feasibility, we have materialized our solution in the CHANGELEDGE prototype that, using elements of the Business Process Execution Language (BPEL), is able to generate correct remediation plans to handle and treat failures in IT change management systems.