Alert Detection in System Logs

Authors:
Adam J. Oliner;Alex Aiken;Jon Stearley
Affiliations:
-;-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 9

A query language for understanding component interactions in production systems

Proceedings of the 24th ACM International Conference on Supercomputing
End-to-end framework for fault management for open source clusters: Ranger

Proceedings of the 2010 TeraGrid Conference
Symptom-based problem determination using log data abstraction

Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Storage and retrieval of system log events using a structured schema based on message type transformation

Proceedings of the 2011 ACM Symposium on Applied Computing
Modeling and tolerating heterogeneous failures in large parallel systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
LogSig: generating system events from raw textual logs

Proceedings of the 20th ACM international conference on Information and knowledge management
Provenance for system troubleshooting

LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Spatio-temporal decomposition, clustering and identification for alert detection in system logs

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Searching similar segments over textual event sequences

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present Nodeinfo, an unsupervised algorithm for anomaly detection in system logs. We demonstrate Nodeinfo's effectiveness on data from four of the world's most powerful supercomputers: using logs representing over 746 million processor-hours, in which anomalous events called alerts were manually tagged for scoring, we aim to automatically identify the regions of the log containing those alerts. We formalize the alert detection task in these terms, describe how Nodeinfo uses the information entropy of message terms to identify alerts, and present an online version of this algorithm, which is now in production use. This is the first work to investigate alert detection on (several) publicly-available supercomputer system logs, thereby providing a reproducible performance baseline.