Automated debugging of SLO violations in enterprise systems

Authors:
Maitreya Natu;Sangameshwar Patil;Vaishali Sadaphal;Harrick Vin
Affiliations:
Tata Research Development and Design Centre, Pune, India;Tata Research Development and Design Centre, Pune, India;Tata Research Development and Design Centre, Pune, India;Tata Research Development and Design Centre, Pune, India
Venue:
COMSNETS'10 Proceedings of the 2nd international conference on COMmunication systems and NETworks
Year:
2010

Citing 14
Cited 0

Event correlation using rule and object based techniques

Proceedings of the fourth international symposium on Integrated network management IV
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Yemanja—A Layered Fault Localization System for Multi-Domain Computing Utilities

Journal of Network and Systems Management
Using Control Theory to Achieve Service Level Objectives In Performance Management

Real-Time Systems
Failure Diagnosis Using Decision Trees

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Towards highly reliable enterprise network services via inference of multi-level dependencies

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Ironmodel: robust performance models in the wild
Answering what-if deployment and configuration questions with wise

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Troubleshooting chronic conditions in large IP networks

CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Towards automated performance diagnosis in a large IPTV network

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Detailed diagnosis in enterprise networks

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
mPlane: an architecture for scalable fault localization

Proceedings of the 2009 workshop on Re-architecting the internet
Automating network application dependency discovery: experiences, limitations, and new solutions

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

A critical business requirement of today's enterprise applications is automated debugging of violation of Service Level Objectives (SLOs). However, the increasing scale and complexity of these systems present various challenges in building such solutions. This problem becomes challenging mainly because of two reasons: (1) availability of large number of metrics that can potentially be the causes, (2) availability of large number of data-points. The existing techniques are either highly compute-intensive and thus are not viable for use on large volumes of data or compromise on accuracy. To successfully balance these two objectives simultaneously, we propose to intelligently prune the search space. We apply feature selection to remove irrelevant and redundant metrics. We then identify temporal regions of interest to narrow down the analysis to a smaller set of data-points. We present a comparative study of the proposed approach with other existing approaches through experimental evaluation.