Classification of Log Files with Limited Labeled Data

  • Authors:
  • Stefan Hommes;Radu State;Thomas Engel

  • Affiliations:
  • University of Luxembourg, SnT, 4, rue Alphonse Weicker, L-2721 Luxembourg;University of Luxembourg, SnT, 4, rue Alphonse Weicker, L-2721 Luxembourg;University of Luxembourg, SnT, 4, rue Alphonse Weicker, L-2721 Luxembourg

  • Venue:
  • Proceedings of Principles, Systems and Applications on IP Telecommunications
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of anomaly detection in log files that consist of a huge number of records. In order to achieve this task, we demonstrate label propagation as a semi-supervised learning technique. The strength of this approach lies in the small amount of labelled data that is needed to label the remaining data. This is an advantage since labelled data needs human expertise which comes at a high cost and becomes infeasible for big datasets. Even though our approach is generally applicable, we focus on the detection of anomalous records in firewall log files. This requires a separation of records into windows which are compared using different distance functions to determine their similarity. Afterwards, we apply label propagation to label a complete dataset in only a limited number of iterations. We demonstrate our approach on a realistic dataset from an ISP.