Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set

Authors:
Maheshkumar Sabhnani;Gursel Serpen
Affiliations:
Electrical Engineering and Computer Science Department, The University of Toledo, Toledo, OH 43606, USA;Electrical Engineering and Computer Science Department, The University of Toledo, Toledo, OH 43606, USA
Venue:
Intelligent Data Analysis
Year:
2004

Citing 9
Cited 20

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining in a data-flow environment: experience in network intrusion detection

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining needle in a haystack: classifying rare classes via two-phase rule induction

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A framework for constructing features and models for intrusion detection systems

ACM Transactions on Information and System Security (TISSEC)
Machine Learning

Machine Learning
Using Artificial Anomalies to Detect Unknown and Known Network Intrusions

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Results of the KDD'99 classifier learning

ACM SIGKDD Explorations Newsletter
KDD-99 classifier learning contest LLSoft's results overview

ACM SIGKDD Explorations Newsletter
Parzen-Window Network Intrusion Detectors

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4

A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers

Computer Communications
Using grammatical evolution for evolving intrusion detection rules

ISP'06 Proceedings of the 5th WSEAS International Conference on Information Security and Privacy
Using grammatical evolution for evolving intrusion detection rules

CSECS'06 Proceedings of the 5th WSEAS International Conference on Circuits, Systems, Electronics, Control & Signal Processing
Application of Data Mining to Network Intrusion Detection: Classifier Selection Model

APNOMS '08 Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management: Challenges for Next Generation Network Operations and Service Management
Enhancing network based intrusion detection for imbalanced data

International Journal of Knowledge-based and Intelligent Engineering Systems
Network anomaly detection based on wavelet analysis

EURASIP Journal on Advances in Signal Processing - Special issue on signal processing applications in network intrusion detection systems
Modeling Network Intrusion Detection System Using Feature Selection and Parameters Optimization

IEICE - Transactions on Information and Systems
Review: The use of computational intelligence in intrusion detection systems: A review

Applied Soft Computing
Combining Feature Selection and Local Modelling in the KDD Cup 99 Dataset

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Detecting Network Anomalies Using CUSUM and EM Clustering

ISICA '09 Proceedings of the 4th International Symposium on Advances in Computation and Intelligence
Measuring similarity in feature space of knowledge entailed by two separate rule sets

Knowledge-Based Systems
A comparison of feature-selection methods for intrusion detection

MMM-ACNS'10 Proceedings of the 5th international conference on Mathematical methods, models and architectures for computer network security
Exploring discrepancies in findings obtained with the KDD Cup '99 data set

Intelligent Data Analysis
Toward modeling lightweight intrusion detection system through correlation-based hybrid feature selection

CISC'05 Proceedings of the First SKLOIS conference on Information Security and Cryptology
Network intrusion detection using wavelet analysis

CIT'04 Proceedings of the 7th international conference on Intelligent Information Technology
Multi-class pattern classification using single, multi-dimensional feature-space feature extraction evolved by multi-objective genetic programming and its application to network intrusion detection

Genetic Programming and Evolvable Machines
A comparative study of use of shannon, rényi and tsallis entropy for attribute selecting in network intrusion detection

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Quantitative intrusion intensity assessment for intrusion detection systems

Security and Communication Networks
Evaluating performance of long short-term memory recurrent neural networks on intrusion detection data

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
A distance sum-based hybrid method for intrusion detection

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large set of machine learning and pattern classification algorithms trained and tested on KDD intrusion detection data set failed to identify most of the user-to-root and remote-to-local attacks, as reported by many researchers in the literature. In light of this observation, this paper aims to expose the deficiencies and limitations of the KDD data set to argue that this data set should not be used to train pattern recognition or machine learning algorithms for misuse detection for these two attack categories. Multiple analysis techniques are employed to demonstrate, both objectively and subjectively, that the KDD training and testing data subsets represent dissimilar target hypotheses for user-to-root and remote-to-local attack categories. These techniques consisted of switching the roles of original training and testing data subsets to develop a decision tree classifier, cross-validation on merged training and testing data subsets, and qualitative and comparative analysis of rules generated independently on training and testing data subsets through the C4.5 decision tree algorithm. Analysis results clearly suggest that no pattern classification or machine learning algorithm can be trained successfully with the KDD data set to perform misuse detection for user-to-root or remote-to-local attack categories. It is further noted that the analysis techniques employed to assess the similarity between the two target hypotheses represented by the training and the testing data subsets can readily be generalized to data set pairs in other problem domains.