Mining large distributed log data in near real time

  • Authors:
  • Stefan Weigert;Matti Hiltunen;Christof Fetzer

  • Affiliations:
  • TU Dresden, Dresden, Germany;AT&T Labs Research, Florham Park, NJ;TU Dresden, Dresden, Germany

  • Venue:
  • SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Analyzing huge amounts of log data is often a difficult task, especially if it has to be done in real time (e.g., fraud detection) or when large amounts of stored data are required for the analysis. Graphs are a data structure often used in log analysis. Examples are clique analysis and communities of interest (COI). However, little attention has been paid to large distributed graphs that allow a high throughput of updates with very low latency. In this paper, we present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10 ms. We validate our approach by presenting two example applications, namely telephony fraud detection and internet attack detection. A thorough evaluation proves the scalability and near real-time properties of our system.