Toward supervised anomaly detection

Authors:
Nico Görnitz;Marius Kloft;Konrad Rieck;Ulf Brefeld
Affiliations:
Machine Learning Laboratory, Technische Universität Berlin, Berlin, Germany and Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York;Machine Learning Laboratory, Technische Universität Berlin, Berlin, Germany and Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York;University of Göttingen, Dep. of Computer Science, Göttingen, Germany;Technische Universität Darmstadt and German Institute for International Educational Research, Darmstadt, Germany
Venue:
Journal of Artificial Intelligence Research
Year:
2013

Citing 40
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Support-Vector Networks

Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Bro: a system for detecting network intruders in real-time

Computer Networks: The International Journal of Computer and Telecommunications Networking
A vector space model for automatic indexing

Communications of the ACM
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
One-class svms for document classification

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Novelty detection: a review—part 1: statistical approaches

Signal Processing
Novelty detection: a review—part 2: neural network based approaches

Signal Processing
Support Vector Data Description

Machine Learning
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Using Active Learning in Intrusion Detection

CSFW '04 Proceedings of the 17th IEEE workshop on Computer Security Foundations
Snort - Lightweight Intrusion Detection for Networks

LISA '99 Proceedings of the 13th USENIX conference on System administration
Using One-Class and Two-Class SVMs for Multiclass Image Annotation

IEEE Transactions on Knowledge and Data Engineering
Beyond the point cloud: from transductive to semi-supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Estimating the Support of a High-Dimensional Distribution

Neural Computation
A continuation method for semi-supervised SVMs

ICML '06 Proceedings of the 23rd international conference on Machine learning
Minimum Enclosing and Maximum Excluding Machine for Pattern Description and Discrimination

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
One Class Classification Methods Based Non-Relevance Feedback Document Retrieval

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Semi-Supervised Learning (Adaptive Computation and Machine Learning)

Semi-Supervised Learning (Adaptive Computation and Machine Learning)
A comparative evaluation of two algorithms for Windows Registry Anomaly Detection

Journal of Computer Security
Value Regularization and Fenchel Duality

The Journal of Machine Learning Research
Polymorphic blending attacks

USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
SVDD-Based Pattern Denoising

Neural Computation
Linear-Time Computation of Similarity Measures for Sequential Data

The Journal of Machine Learning Research
McPAD: A multiple classifier system for accurate payload-based anomaly detection

Computer Networks: The International Journal of Computer and Telecommunications Networking
Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Proceedings of the 2009 ACM symposium on Applied Computing
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Active and Semi-supervised Data Domain Description

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
A multi-model approach to the detection of web-based attacks

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web security
TokDoc: a self-healing web application firewall

Proceedings of the 2010 ACM Symposium on Applied Computing
Semi-Supervised Novelty Detection

The Journal of Machine Learning Research
lp-Norm Multiple Kernel Learning

The Journal of Machine Learning Research
Detecting unknown network attacks using language models

DIMVA'06 Proceedings of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment
Learning from positive and unlabeled examples with different data distributions

ECML'05 Proceedings of the 16th European conference on Machine Learning
Pattern classification via single spheres

DS'05 Proceedings of the 8th international conference on Discovery Science
Anagram: a content anomaly detector resistant to mimicry attack

RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Anomaly detection is being regarded as an unsupervised learning task as anomalies stem from adversarial or unlikely events with unknown distributions. However, the predictive performance of purely unsupervised anomaly detection often fails to match the required detection rates in many tasks and there exists a need for labeled data to guide the model generation. Our first contribution shows that classical semi-supervised approaches, originating from a supervised classifier, are inappropriate and hardly detect new and unknown anomalies. We argue that semi-supervised anomaly detection needs to ground on the unsupervised learning paradigm and devise a novel algorithm that meets this requirement. Although being intrinsically non-convex, we further show that the optimization problem has a convex equivalent under relatively mild assumptions. Additionally, we propose an active learning strategy to automatically filter candidates for labeling. In an empirical study on network intrusion detection data, we observe that the proposed learning methodology requires much less labeled data than the state-of-the-art, while achieving higher detection accuracies.