A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

Authors:
Mohammad M. Masud;Jing Gao;Latifur Khan;Jiawei Han;Bhavani Thuraisingham
Affiliations:
Department of Computer Science, University of Texas at Dallas,;Department of Computer Science, University of Illinois at Urbana-Champaign,;Department of Computer Science, University of Texas at Dallas,;Department of Computer Science, University of Illinois at Urbana-Champaign,;Department of Computer Science, University of Texas at Dallas,
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 9
Cited 4

BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Induction of Decision Trees

Machine Learning
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Using additive expert ensembles to cope with concept drift

ICML '05 Proceedings of the 22nd international conference on Machine learning
Peer-to-peer botnets: overview and case study

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Building a new classifier in an ensemble using streaming unlabeled data

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Cloud-based malware detection for evolving data streams

ACM Transactions on Management Information Systems (TMIS)
Mixed-sampling approach to unbalanced data distributions: a case study involving Leukemia's document profiling

WSEAS Transactions on Information Science and Applications
Data stream classification with artificial endocrine system

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v -fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multi-partition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches. We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.