Exploring discrepancies in findings obtained with the KDD Cup '99 data set

Authors:
Vegard Engen;Jonathan Vincent;Keith Phalp
Affiliations:
(Correspd. Tel.: +44 1202 965503/ E-mail: vengen@bournemouth.ac.uk) Software Systems Research Centre, Bournemouth University, Fern Barrow, Talbot Campus, Poole, UK;Software Systems Research Centre, Bournemouth University, Fern Barrow, Talbot Campus, Poole, UK;Software Systems Research Centre, Bournemouth University, Fern Barrow, Talbot Campus, Poole, UK
Venue:
Intelligent Data Analysis
Year:
2011

Citing 57
Cited 0

An Intrusion-Detection Model

IEEE Transactions on Software Engineering - Special issue on computer security and privacy
C4.5: programs for machine learning

C4.5: programs for machine learning
State Transition Analysis: A Rule-Based Intrusion Detection Approach

IEEE Transactions on Software Engineering
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Towards a taxonomy of intrusion-detection systems

Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue on computer network security
The 1999 DARPA off-line intrusion detection evaluation

Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue on recent advances in intrusion detection systems
A framework for constructing features and models for intrusion detection systems

ACM Transactions on Information and System Security (TISSEC)
Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory

ACM Transactions on Information and System Security (TISSEC)
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Machine Learning

Machine Learning
Artficial Immune Systems and Their Applications

Artficial Immune Systems and Their Applications
Artificial Immune Systems: A New Computational Intelligence Paradigm

Artificial Immune Systems: A New Computational Intelligence Paradigm
Effiicient BackProp

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
A Case-Based Reasoning Approach to the Resolution of Faults in Communication Networks

Proceedings of the IFIP TC6/WG6.6 Third International Symposium on Integrated Network Management with participation of the IEEE Communications Society CNOM and with support from the Institute for Educational Services
NetSTAT: A Network-Based Intrusion Detection Approach

ACSAC '98 Proceedings of the 14th Annual Computer Security Applications Conference
Intrusion Detection Applying Machine Learning to Solaris Audit Data

ACSAC '98 Proceedings of the 14th Annual Computer Security Applications Conference
Effective Intrusion Detection Using Multiple Sensors in Wireless Ad Hoc Networks

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 2 - Volume 2
Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks

SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
Identifying key features for intrusion detection using neural networks

ICCC '02 Proceedings of the 15th international conference on Computer communication
Results of the KDD'99 classifier learning

ACM SIGKDD Explorations Newsletter
eXpert-BSM: A Host-Based Intrusion Detection Solution for Sun Solaris

ACSAC '01 Proceedings of the 17th Annual Computer Security Applications Conference
USTAT: A Real-Time Intrusion Detection System for UNIX

SP '93 Proceedings of the 1993 IEEE Symposium on Security and Privacy
Information-Theoretic Measures for Anomaly Detection

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Naive Bayes vs decision trees in intrusion detection systems

Proceedings of the 2004 ACM symposium on Applied computing
Ant Colony Optimization

Ant Colony Optimization
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
HMM profiles for network traffic classification

Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security
Intrusion Detection and Correlation: Challenges and Solutions

Intrusion Detection and Correlation: Challenges and Solutions
Unsupervised anomaly detection in network intrusion detection using clusters

ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
Intelligent Bayesian classifiers in network intrusion detection

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
On the combination of naive Bayes and decision trees for intrusion detection

CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-1 (CIMCA-IAWTIC'06) - Volume 01
On the combination of naive Bayes and decision trees for intrusion detection

CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-1 (CIMCA-IAWTIC'06) - Volume 01
Efficacy of Hidden Markov Models Over Neural Networks in Anomaly Intrusion Detection

COMPSAC '06 Proceedings of the 30th Annual International Computer Software and Applications Conference - Volume 01
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning Greek verb complements: addressing the class imbalance

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Modeling intrusion detection system using hybrid intelligent systems

Journal of Network and Computer Applications - Special issue: Network and information security: A computational intelligence approach
Towards an artificial immune system for network intrusion detection: an investigation of dynamic clonal selection

CEC '02 Proceedings of the Evolutionary Computation on 2002. CEC '02. Proceedings of the 2002 Congress - Volume 02
Comparative Study of Supervised Machine Learning Techniques for Intrusion Detection

CNSR '07 Proceedings of the Fifth Annual Conference on Communication Networks and Services Research
Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set

Intelligent Data Analysis
Intrusion detection in computer networks by a modular ensemble of one-class classifiers

Information Fusion
Processing of massive audit data streams for real-time anomaly intrusion detection

Computer Communications
Author identification: Using text sampling to handle the class imbalance problem

Information Processing and Management: an International Journal
2008 Special Issue: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance

Neural Networks
Design of multiple-level hybrid classifier for intrusion detection system using Bayesian clustering and decision trees

Pattern Recognition Letters
Definition Extraction with Balanced Random Forests

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Ensemble of One-Class Classifiers for Network Intrusion Detection System

IAS '08 Proceedings of the 2008 The Fourth International Conference on Information Assurance and Security
Intrusion Detection Using Evolutionary Neural Networks

PCI '08 Proceedings of the 2008 Panhellenic Conference on Informatics
Handling class imbalance in customer churn prediction

Expert Systems with Applications: An International Journal
Customer churn prediction using improved balanced random forests

Expert Systems with Applications: An International Journal
Intrusion detection using fuzzy association rules

Applied Soft Computing
Enhancing network based intrusion detection for imbalanced data

International Journal of Knowledge-based and Intelligent Engineering Systems
Ensemble of classifiers for detecting network intrusion

Proceedings of the International Conference on Advances in Computing, Communication and Control
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Learning from imbalanced data in surveillance of nosocomial infection

Artificial Intelligence in Medicine
An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks

Expert Systems with Applications: An International Journal
Intrusion detection: a brief history and overview

Computer

Quantified Score

Hi-index	0.00

Visualization

Abstract

The KDD Cup '99 data set has been widely used to evaluate intrusion detection prototypes, most based on machine learning techniques, for nearly a decade. The data set served well in the KDD Cup '99 competition to demonstrate that machine learning can be useful in intrusion detection systems. However, there are discrepancies in the findings reported in the literature. Further, some researchers have published criticisms of the data (and the DARPA data from which the KDD Cup '99 data has been derived), questioning the validity of results obtained with this data. Despite the criticisms, researchers continue to use the data due to a lack of better publicly available alternatives. Hence, it is important to identify the value of the data set and the findings from the extensive body of research based on it, which has largely been ignored by the existing critiques. This paper reports on an empirical investigation, demonstrating the impact of several methodological differences in the publicly available subsets, which uncovers several underlying causes of the discrepancy in the results reported in the literature. These findings allow us to better interpret the current body of research, and inform recommendations for future use of the data set.