A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

  • Authors:
  • Mohammad M. Masud;Jing Gao;Latifur Khan;Jiawei Han;Bhavani Thuraisingham

  • Affiliations:
  • Department of Computer Science, University of Texas at Dallas,;Department of Computer Science, University of Illinois at Urbana-Champaign,;Department of Computer Science, University of Texas at Dallas,;Department of Computer Science, University of Illinois at Urbana-Champaign,;Department of Computer Science, University of Texas at Dallas,

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v -fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multi-partition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches. We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.