Instance-Based Learning Algorithms
Machine Learning
A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Systematic data selection to mine concept-drifting data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to detect malicious executables in the wild
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Polygraph: Automatically Generating Signatures for Polymorphic Worms
SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Combining proactive and reactive predictions for data streams
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits
Proceedings of the 12th ACM conference on Computer and communications security
Using additive expert ensembles to cope with concept drift
ICML '05 Proceedings of the 22nd international conference on Machine learning
Supervised dimensionality reduction using mixture models
ICML '05 Proceedings of the 22nd international conference on Machine learning
A Framework for On-Demand Classification of Evolving Data Streams
IEEE Transactions on Knowledge and Data Engineering
Hamsa: Fast Signature Generation for Zero-day PolymorphicWorms with Provable Attack Resilience
SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Peer-to-peer botnets: overview and case study
HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A scalable multi-level feature extraction technique to detect malicious executables
Information Systems Frontiers
Closed-form supervised dimensionality reduction with generalized linear models
Proceedings of the 25th international conference on Machine learning
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Adapted One-versus-All Decision Trees for Data Stream Classification
IEEE Transactions on Knowledge and Data Engineering
A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Stop Chasing Trends: Discovering High Order Models in Evolving Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
New ensemble methods for evolving data streams
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting an antivirus interface
Computer Standards & Interfaces
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Mining Data Streams with Labeled and Unlabeled Training Examples
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Parallel K-Means Clustering Based on MapReduce
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Tracking concept drift in malware families
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Review: An intrusion detection and prevention system in cloud computing: A systematic review
Journal of Network and Computer Applications
Taxonomy and proposed architecture of intrusion detection and prevention systems for cloud computing
CSS'12 Proceedings of the 4th international conference on Cyberspace Safety and Security
Design and Implementation of a Data Mining System for Malware Detection
Journal of Integrated Design & Process Science
Hi-index | 0.00 |
Data stream classification for intrusion detection poses at least three major challenges. First, these data streams are typically infinite-length, making traditional multipass learning algorithms inapplicable. Second, they exhibit significant concept-drift as attackers react and adapt to defenses. Third, for data streams that do not have any fixed feature set, such as text streams, an additional feature extraction and selection task must be performed. If the number of candidate features is too large, then traditional feature extraction techniques fail. In order to address the first two challenges, this article proposes a multipartition, multichunk ensemble classifier in which a collection of v classifiers is trained from r consecutive data chunks using v-fold partitioning of the data, yielding an ensemble of such classifiers. This multipartition, multichunk ensemble technique significantly reduces classification error compared to existing single-partition, single-chunk ensemble approaches, wherein a single data chunk is used to train each classifier. To address the third challenge, a feature extraction and selection technique is proposed for data streams that do not have any fixed feature set. The technique's scalability is demonstrated through an implementation for the Hadoop MapReduce cloud computing architecture. Both theoretical and empirical evidence demonstrate its effectiveness over other state-of-the-art stream classification techniques on synthetic data, real botnet traffic, and malicious executables.