Experience mining Google's production console logs

Authors:
Wei Xu;Ling Huang;Armando Fox;David Patterson;Michael Jordan
Affiliations:
University of California at Berkeley;Intel Labs, Berkeley;University of California at Berkeley;University of California at Berkeley;University of California at Berkeley
Venue:
SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
Year:
2010

Citing 16
Cited 4

Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Splitting-merging model of Chinese word tokenization and segmentation

Natural Language Engineering
Automated System Monitoring and Notification With Swatch

LISA '93 Proceedings of the 7th USENIX conference on System administration
Towards informatic analysis of syslogs

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
What Supercomputers Say: A Study of Five System Logs

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Refactoring support for the C++ development tooling

Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion
From dirt to shovels: fully automatic tool generation from ad hoc data

Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallelism by design: data analysis with sawzall

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Clustering event logs using iterative partitioning

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting large-scale system problems by mining console logs

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Online System Problem Detection by Mining Patterns of Console Logs

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
SherLog: error diagnosis by connecting clues from run-time logs

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
X-trace: a pervasive network tracing framework

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

Leveraging existing instrumentation to automatically infer invariant-constrained models

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Inferring networked system models from behavior traces

Proceedings of the 2012 ACM conference on CoNEXT student workshop
Unifying FSM-inference algorithms through declarative specification

Proceedings of the 2013 International Conference on Software Engineering
An integrated framework for optimizing automatic monitoring systems in large IT infrastructures

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe our early experience in applying our console log mining techniques [19, 20] to logs from production Google systems with thousands of nodes. This data set is five orders of magnitude in size and contains almost 20 times as many messages types as the Hadoop data set we used in [19]. It also has many properties that are unique to large scale production deployments (e.g., the system stays on for several months and multiple versions of the software can run concurrently). Our early experience shows that our techniques, including source code based log parsing, state and sequence based feature creation and problem detection, work well on this production data set. We also discuss our experience in using our log parser to assist the log sanitization.