Generating regular expression signatures for network traffic classification in trusted network management

  • Authors:
  • Yu Wang;Yang Xiang;Wanlei Zhou;Shunzheng Yu

  • Affiliations:
  • School of Information Technology, Deakin University, Melbourne, 221 Burwood Highway, Burwood VIC 3125, Australia;School of Information Technology, Deakin University, Melbourne, 221 Burwood Highway, Burwood VIC 3125, Australia;School of Information Technology, Deakin University, Melbourne, 221 Burwood Highway, Burwood VIC 3125, Australia;Department of Electronic and Communication Engineering, Sun Yat-Sen University, Guangzhou, China

  • Venue:
  • Journal of Network and Computer Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Network traffic classification is a critical foundation for trusted network management and security systems. Matching application signatures in traffic payload is widely considered to be the most reliable classifying method. However, deriving accurate and efficient signatures for various applications is not a trivial task, for which current practice is mostly manual thus error-prone and of low efficiency. In this paper, we tackle the problem of automatic signature generation. In particular, we focus on generating regular expression signatures with a certain subset of standard syntax rules, which are of sufficient expressive power and compatible with most practical systems. We propose a novel approach that takes as input a labeled training data set and produces a set of signatures for matching the application classes presented in the data. The approach involves four procedures: pre-processing to extract application session payload, tokenization to find common substrings and incorporate position constraints, multiple sequence alignment to find common subsequences, and signature construction to transform the results into regular expressions. A real life full payload traffic trace is used to evaluate the proposed system, and signatures for a range of applications are automatically derived. The results indicate that the signatures are of high quality, and exhibit low false negatives and false positives.