A close look on n-grams in intrusion detection: anomaly detection vs. classification

Authors:
Christian Wressnegger;Guido Schwenk;Daniel Arp;Konrad Rieck
Affiliations:
idalab GmbH, Berlin, Germany;Berlin University of Technology, Berlin, Germany;University of Göttingen, Göttingen, Germany;University of Göttingen, Göttingen, Germany
Venue:
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Year:
2013

Citing 37
Cited 0

Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Service specific anomaly detection for network intrusion detection

Proceedings of the 2002 ACM symposium on Applied computing
Learning Program Behavior Profiles for Intrusion Detection

Proceedings of the Workshop on Intrusion Detection and Network Monitoring
"Why 6?" Defining the Operational Limits of Stide, an Anomaly-Based Intrusion Detector

SP '02 Proceedings of the 2002 IEEE Symposium on Security and Privacy
A high-level programming environment for packet trace anonymization and transformation

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Information-Theoretic Measures for Anomaly Detection

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Polymorphic blending attacks

USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
Challenging the anomaly detection paradigm: a provocative discussion

NSPW '06 Proceedings of the 2006 workshop on New security paradigms
Intrusion detection using sequences of system calls

Journal of Computer Security
An inquiry into the nature and causes of the wealth of internet miscreants

Proceedings of the 14th ACM conference on Computer and communications security
Casting out Demons: Sanitizing Training Data for Anomaly Sensors

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Classification of packed executables for accurate computer virus detection

Pattern Recognition Letters
McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables

ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
McPAD: A multiple classifier system for accurate payload-based anomaly detection

Computer Networks: The International Journal of Computer and Telecommunications Networking
Hash Kernels for Structured Data

The Journal of Machine Learning Research
Detection and analysis of drive-by-download attacks and malicious JavaScript code

Proceedings of the 19th international conference on World wide web
ADSandbox: sandboxing JavaScript to fight malicious websites

Proceedings of the 2010 ACM Symposium on Applied Computing
Comparing anomaly detection techniques for HTTP

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Outside the Closed World: On Using Machine Learning for Network Intrusion Detection

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
Cujo: efficient detection and prevention of drive-by-download attacks

Proceedings of the 26th Annual Computer Security Applications Conference
A sense of self for Unix processes

SP'96 Proceedings of the 1996 IEEE conference on Security and privacy
Automatic analysis of malware behavior using machine learning

Journal of Computer Security
Measuring pay-per-install: the commoditization of malware distribution

SEC'11 Proceedings of the 20th USENIX conference on Security
n-Gram Statistics for Natural Language Understanding and Text Processing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Static detection of malicious JavaScript-bearing PDF documents

Proceedings of the 27th Annual Computer Security Applications Conference
Detecting unknown network attacks using language models

DIMVA'06 Proceedings of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment
Anagram: a content anomaly detector resistant to mimicry attack

RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
EvilSeed: A Guided Approach to Finding Malicious Web Pages

SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy
ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions

SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy
Autonomous learning for detection of JavaScript attacks: vision or reality?

Proceedings of the 5th ACM workshop on Security and artificial intelligence
N-Gram against the machine: on the feasibility of the n-gram network analysis for binary protocols

RAID'12 Proceedings of the 15th international conference on Research in Attacks, Intrusions, and Defenses
A static, packer-agnostic filter to detect similar malware samples

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detection methods based on n-gram models have been widely studied for the identification of attacks and malicious software. These methods usually build on one of two learning schemes: anomaly detection, where a model of normality is constructed from n-grams, or classification, where a discrimination between benign and malicious n-grams is learned. Although successful in many security domains, previous work falls short of explaining why a particular scheme is used and more importantly what renders one favorable over the other for a given type of data. In this paper we provide a close look on n-gram models for intrusion detection. We specifically study anomaly detection and classification using n-grams and develop criteria for data being used in one or the other scheme. Furthermore, we apply these criteria in the scope of web intrusion detection and empirically validate their effectiveness with different learning-based detection methods for client-side and service-side attacks.