Guided Problem Diagnosis through Active Learning

Authors:
Songyun Duan;Shivnath Babu
Affiliations:
-;-
Venue:
ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
Year:
2008

Citing 0
Cited 10

AdaptGuard: guarding adaptive systems from instability

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Fingerprinting the datacenter: automated classification of performance crises

Proceedings of the 5th European conference on Computer systems
Empirical comparison of techniques for automated failure diagnosis

SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
AHAFS subsystem for enhancing operating system health in the cloud computing era

IBM Journal of Research and Development
Towards 'integrated' monitoring and management of DataCenters using complex event processing techniques

COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
Practical experiences with chronics discovery in large telecommunications systems

SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Session management of correlated multi-stream 3D tele-immersive environments

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Practical experiences with chronics discovery in large telecommunications systems

ACM SIGOPS Operating Systems Review
Diagnosis of software failures using computational geometry

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is widespread interest today in developing tools that can diagnose the cause of a system failure accurately and efficiently based on monitoring data collected from the system. Over time, the system monitoring data will contain two types of failure data: (i) annotated failure data L, which is monitoring data collected from failure states of the system, where the cause of failure has been diagnosed and attached as annotations with the data; and (ii) unannotated failure data U. Previous work on wholly- or partially-automated diagnosis focused on L or U in isolation. In this paper, we argue that it is important to consider both L and U together to improve the overall accuracy of diagnosis; and in particular, to proactively move instances from U to L. However, such movement requires manual diagnosis effort from system administrators. Since manual diagnosis is expensive and time-consuming, we propose an algorithm to make the best use of manual effort while maximizing the benefit gained from newly diagnosed instances. We report an experimental evaluation of our algorithm using data from a variety of failures---both single failures and multiple correlated failures---injected in a testbed, as well as with synthetic data.