Machine Learning - Special issue on learning with probabilistic representations
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
The Journal of Machine Learning Research
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Hunting for problems with Artemis
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Temporal data mining approaches for sustainable chiller management in data centers
ACM Transactions on Intelligent Systems and Technology (TIST)
Database scalability, elasticity, and autonomy in the cloud
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Proceedings of the 9th international conference on Autonomic computing
Hi-index | 0.00 |
Previous work showed that statistical analysis techniques could successfully be used to construct compact signatures of distinct operational problems in Internet server systems. Because signatures are amenable to well-known similarity search techniques, they can be used as a way to index past problems and identify particular operational problems as new or recurrent. In this paper we use a different statistical technique for constructing signatures (logistic regression with L1 regularization) that improves on previous work in two ways. First, our new approach works for cases where the number of features is an order of magnitude larger than the number of samples and also scales to problems with over 50,000 samples. Second, we get encouraging results regarding the stability of the models and the signatures by cross-validating the accuracy of the models from one section of the data center on another section. We validate our approach on data from an Internet service testbed and also from a production enterprise system comprising hundreds of servers in several data centers.