Nonparametric semi-supervised learning for network intrusion detection: combining performance improvements with realistic in-situ training

Authors:
Christopher T. Symons;Justin M. Beaver
Affiliations:
Oak Ridge National Laboratory, Oak Ridge, TN, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA
Venue:
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Year:
2012

Citing 19
Cited 1

Data preparation for data mining

Data preparation for data mining
The base-rate fallacy and the difficulty of intrusion detection

ACM Transactions on Information and System Security (TISSEC)
Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
Semi-Supervised Learning on Riemannian Manifolds

Machine Learning
Outlier detection by active learning

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Characterizing the Function Space for Bayesian Kernel Models

The Journal of Machine Learning Research
Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Proceedings of the 2009 ACM symposium on Applied Computing
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
New theoretical frameworks for machine learning

New theoretical frameworks for machine learning
Review: Intrusion detection by machine learning: A review

Expert Systems with Applications: An International Journal
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Semi-Supervised Learning

Semi-Supervised Learning
Outside the Closed World: On Using Machine Learning for Network Intrusion Detection

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation

Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security
Data Mining and Machine Learning in Cybersecurity

Data Mining and Machine Learning in Cybersecurity
Improving Performance of Anomaly-Based IDS by Combining Multiple Classifiers

SAINT '11 Proceedings of the 2011 IEEE/IPSJ International Symposium on Applications and the Internet
Graph-Based Semi-Supervised Learning and Spectral Kernel Design

IEEE Transactions on Information Theory

A learning system for discriminating variants of malicious network traffic

Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

A barrier to the widespread adoption of learning-based network intrusion detection tools is the in-situ training requirements for effective discrimination of malicious traffic. Supervised learning techniques necessitate a quantity of labeled examples that is often intractable, and at best cost-prohibitive. Recent advances in semi-supervised techniques have demonstrated the ability to generalize well based on a significantly smaller set of labeled samples. In network intrusion detection, placing reasonable requirements on the number of training examples provides realistic expectations that a learning-based system can be trained in the environment where it will be deployed. This in-situ training is necessary to ensure that the assumptions associated with the learning process hold, and thereby support a reasonable belief in the generalization ability of the resulting model. In this paper, we describe the application of a carefully selected nonparametric, semi-supervised learning algorithm to the network intrusion problem, and compare the performance to other model types using feature-based data derived from an operational network. We demonstrate dramatic performance improvements over supervised learning and anomaly detection in discriminating real, previously unseen, malicious network traffic while generating an order of magnitude fewer false alerts than any alternative, including a signature IDS tool deployed on the same network.