Building a new classifier in an ensemble using streaming unlabeled data

Authors:
Mehmed Kantardzic;Joung Woo Ryu;Chamila Walgampaya
Affiliations:
CECS Department, Speed School of Engineering, University of Louisville, Louisville, KY;CECS Department, Speed School of Engineering, University of Louisville, Louisville, KY;CECS Department, Speed School of Engineering, University of Louisville, Louisville, KY
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Year:
2010

Citing 10
Cited 1

A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers

Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
A case-study on naïve labelling for the nearest mean and the linear discriminant classifiers

Pattern Recognition
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
Active Learning from Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence

Semi-supervised ensemble learning of data streams in the presence of concept drift

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is expensive and impractical to manually label all samples in realworld streaming data when the correct class is not available in real time. In this paper, we propose an ensemble method of determining which samples should be labeled from streaming unlabeled data and when they will be labeled according to changes in distribution of streaming unlabeled data. In particular, the labeling point in time is an important factor for building an efficient ensemble in practical aspects. In order to evaluate the performance of our ensemble method, we used synthetic streaming data with concept drift and the intrusion detection data from the KDD'99 Cup. We compared the results of the proposed method and those of the existing ensemble methods that periodically build new classifiers for an ensemble. In the synthetic streaming data, the proposed method produced average 14.1% higher classification accuracy, and the number of new classifiers reduced by average 12.6%. With the intrusion detection data, our method produced similar accuracy to existing methods but used only 0.007% of the labeled streaming data.