Casting out Demons: Sanitizing Training Data for Anomaly Sensors

Authors:
Gabriela F. Cretu;Angelos Stavrou;Michael E. Locasto;Salvatore J. Stolfo;Angelos D. Keromytis
Affiliations:
-;-;-;-;-
Venue:
SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Year:
2008

Citing 0
Cited 16

A Self-learning System for Detection of Anomalous SIP Messages

Principles, Systems and Applications of IP Telecommunications. Services and Security for Next Generation Networks
Incorporation of Application Layer Protocol Syntax into Anomaly Detection

ICISS '08 Proceedings of the 4th International Conference on Information Systems Security
Keep your friends close: the necessity for updating an anomaly sensor with legitimate environment changes

Proceedings of the 2nd ACM workshop on Security and artificial intelligence
Protecting a Moving Target: Addressing Web Application Concept Drift

RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
Adaptive Anomaly Detection via Self-calibration and Dynamic Updating

RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
A Survey of Voice over IP Security Research

ICISS '09 Proceedings of the 5th International Conference on Information Systems Security
Mining frequent patterns from network flows for monitoring network

Expert Systems with Applications: An International Journal
Machine learning in adversarial environments

Machine Learning
Moving targets: when data classes depend on subjective judgement, or they are crafted by an adversary to mislead pattern analysis algorithms - the cases of content based image retrieval and adversarial classification

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Abstracting audit data for lightweight intrusion detection

ICISS'10 Proceedings of the 6th international conference on Information systems security
Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning

WOOT'11 Proceedings of the 5th USENIX conference on Offensive technologies
Cross-Domain collaborative anomaly detection: so far yet so close

RAID'11 Proceedings of the 14th international conference on Recent Advances in Intrusion Detection
The use of artificial-intelligence-based ensembles for intrusion detection: a review

Applied Computational Intelligence and Soft Computing
Security analysis of online centroid anomaly detection

The Journal of Machine Learning Research
A close look on n-grams in intrusion detection: anomaly detection vs. classification

Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Approaches to adversarial drift

Proceedings of the 2013 ACM workshop on Artificial intelligence and security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these “micro-models” to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as “attack-free” and “regular” as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site.