Building a new classifier in an ensemble using streaming unlabeled data

  • Authors:
  • Mehmed Kantardzic;Joung Woo Ryu;Chamila Walgampaya

  • Affiliations:
  • CECS Department, Speed School of Engineering, University of Louisville, Louisville, KY;CECS Department, Speed School of Engineering, University of Louisville, Louisville, KY;CECS Department, Speed School of Engineering, University of Louisville, Louisville, KY

  • Venue:
  • IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is expensive and impractical to manually label all samples in realworld streaming data when the correct class is not available in real time. In this paper, we propose an ensemble method of determining which samples should be labeled from streaming unlabeled data and when they will be labeled according to changes in distribution of streaming unlabeled data. In particular, the labeling point in time is an important factor for building an efficient ensemble in practical aspects. In order to evaluate the performance of our ensemble method, we used synthetic streaming data with concept drift and the intrusion detection data from the KDD'99 Cup. We compared the results of the proposed method and those of the existing ensemble methods that periodically build new classifiers for an ensemble. In the synthetic streaming data, the proposed method produced average 14.1% higher classification accuracy, and the number of new classifiers reduced by average 12.6%. With the intrusion detection data, our method produced similar accuracy to existing methods but used only 0.007% of the labeled streaming data.