Three research challenges at the intersection of machine learning, statistical induction, and systems

Authors:
Moises Goldszmidt;Ira Cohen;Armando Fox;Steve Zhang
Affiliations:
Hewlett-Packard Labs, Palo Alto, CA;Hewlett-Packard Labs, Palo Alto, CA;Computer Science Department, Stanford University;Computer Science Department, Stanford University
Venue:
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Year:
2005

Citing 21
Cited 6

Information-based objective functions for active data selection

Neural Computation
An introduction to computational learning theory

An introduction to computational learning theory
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Adaptive Probabilistic Networks with Hidden Variables

Machine Learning - Special issue on learning with probabilistic representations
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
The Vision of Autonomic Computing

Computer
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Using probabilistic reasoning to automate software tuning

Using probabilistic reasoning to automate software tuning
Ensembles of Models for Automated Diagnosis of System Performance Problems

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Failure Diagnosis Using Decision Trees

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Using computers to diagnose computer problems

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Sequential update of Bayesian network structure

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
On the sample complexity of learning Bayesian networks

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence

Why did my pc suddenly slow down?

SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
Fingerprinting the datacenter: automated classification of performance crises

Proceedings of the 5th European conference on Computer systems
Automated experiment-driven management of (database) systems

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
A case for machine learning to optimize multicore performance

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Software error early detection system based on run-time statistical analysis of function return values

HotACI'06 Proceedings of the First international conference on Hot topics in autonomic computing
Performance optimization of deployed software-as-a-service applications

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research activity [2, 12, 27, 10, 1] has shown encouraging results for performance debugging and failure diagnosis and detection in systems by using approaches based on automatically inducing models and deriving correlations from observed data. We believe that maximizing the potential of this line of research will require surmounting some fundamental challenges arising not from the modeling techniques themselves, but specifically from the application of those techniques to real-world systems. We specifically formulate three challenges. First, as new data is collected from a system, previously-induced models must be continuously assessed and validated, with the ultimate aim of achieving online adaption to system changes. Second, human operators must be able to effectively interact with the models, including interpreting model findings to generate explanations, enabling human feedback to improve the models, and identifying false positives and missed detections. Third, it should be possible to formally manipulate "signatures" of system state as represented by these models, allowing us to query the system's past to identify recurring problems and manually annotate them with additional information. We contend that the specifics of this problem domain not only raise these challenges, but also provide the knowledge base from which to derive well-engineered solutions to them. We suggest some possible strategies for addressing each challenge and show how they arise in the context of a real example.