A hybrid machine learning approach to network anomaly detection

  • Authors:
  • Taeshik Shon;Jongsub Moon

  • Affiliations:
  • IP Lab, TN R&D Center, Dong Suwon P.O. Box 105, Maetan-3dong, Suwon 442-600, Republic of Korea;CIST/GSIS, Korea University, 5-ga, Anamdong, Seongbukgu, Seoul 136-701, Republic of Korea

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2007

Quantified Score

Hi-index 0.07

Visualization

Abstract

Zero-day cyber attacks such as worms and spy-ware are becoming increasingly widespread and dangerous. The existing signature-based intrusion detection mechanisms are often not sufficient in detecting these types of attacks. As a result, anomaly intrusion detection methods have been developed to cope with such attacks. Among the variety of anomaly detection approaches, the Support Vector Machine (SVM) is known to be one of the best machine learning algorithms to classify abnormal behaviors. The soft-margin SVM is one of the well-known basic SVM methods using supervised learning. However, it is not appropriate to use the soft-margin SVM method for detecting novel attacks in Internet traffic since it requires pre-acquired learning information for supervised learning procedure. Such pre-acquired learning information is divided into normal and attack traffic with labels separately. Furthermore, we apply the one-class SVM approach using unsupervised learning for detecting anomalies. This means one-class SVM does not require the labeled information. However, there is downside to using one-class SVM: it is difficult to use the one-class SVM in the real world, due to its high false positive rate. In this paper, we propose a new SVM approach, named Enhanced SVM, which combines these two methods in order to provide unsupervised learning and low false alarm capability, similar to that of a supervised SVM approach. We use the following additional techniques to improve the performance of the proposed approach (referred to as Anomaly Detector using Enhanced SVM): First, we create a profile of normal packets using Self-Organized Feature Map (SOFM), for SVM learning without pre-existing knowledge. Second, we use a packet filtering scheme based on Passive TCP/IP Fingerprinting (PTF), in order to reject incomplete network traffic that either violates the TCP/IP standard or generation policy inside of well-known platforms. Third, a feature selection technique using a Genetic Algorithm (GA) is used for extracting optimized information from raw internet packets. Fourth, we use the flow of packets based on temporal relationships during data preprocessing, for considering the temporal relationships among the inputs used in SVM learning. Lastly, we demonstrate the effectiveness of the Enhanced SVM approach using the above-mentioned techniques, such as SOFM, PTF, and GA on MIT Lincoln Lab datasets, and a live dataset captured from a real network. The experimental results are verified by m-fold cross validation, and the proposed approach is compared with real world Network Intrusion Detection Systems (NIDS).