Experience mining Google's production console logs

  • Authors:
  • Wei Xu;Ling Huang;Armando Fox;David Patterson;Michael Jordan

  • Affiliations:
  • University of California at Berkeley;Intel Labs, Berkeley;University of California at Berkeley;University of California at Berkeley;University of California at Berkeley

  • Venue:
  • SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe our early experience in applying our console log mining techniques [19, 20] to logs from production Google systems with thousands of nodes. This data set is five orders of magnitude in size and contains almost 20 times as many messages types as the Hadoop data set we used in [19]. It also has many properties that are unique to large scale production deployments (e.g., the system stays on for several months and multiple versions of the software can run concurrently). Our early experience shows that our techniques, including source code based log parsing, state and sequence based feature creation and problem detection, work well on this production data set. We also discuss our experience in using our log parser to assist the log sanitization.