Nail-it-down: nailing and fixing configuration faults in cloud environments

  • Authors:
  • Kalapriya Kannan;Anuradha Bhamidipaty

  • Affiliations:
  • IBM India Pvt Ltd, New Delhi, India;IBM India Pvt Ltd, Bengaluru, India

  • Venue:
  • Proceedings of the ACM International Conference on Computing Frontiers
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Faults due to configuration of resources account for majority of errors in distributed software systems. Yet, the problem of identifying faulty configuration remains at large. Current approaches for fault identification are focused on event correlation techniques which suffer from limited granular data generated by software components. As complexity of cloud environments increase, resource sharing increases many-fold thereby making it even harder to isolate configuration faults through analysis of events. In this paper, we propose a scalable approach that not only identifies the presence of a configuration fault but also attempts to nail down the parameter that is the source of the observed fault. We leverage the knowledge of shared resources in the environment and use a simple matrix representation for providing near real-time analysis of the faults. This enables the solution to be used for both reactive management and for automated proactive problem determination. Experiments through simulations demonstrate that our approach is effective in identifying configuration faults with reduced time and increased accuracy. Our algorithm gracefully handles the complexity of the problem as the system size grows.