A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers
Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts
The Journal of Machine Learning Research
Active Learning from Data Streams
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Semi-supervised ensemble learning of data streams in the presence of concept drift
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Hi-index | 0.00 |
It is expensive and impractical to manually label all samples in realworld streaming data when the correct class is not available in real time. In this paper, we propose an ensemble method of determining which samples should be labeled from streaming unlabeled data and when they will be labeled according to changes in distribution of streaming unlabeled data. In particular, the labeling point in time is an important factor for building an efficient ensemble in practical aspects. In order to evaluate the performance of our ensemble method, we used synthetic streaming data with concept drift and the intrusion detection data from the KDD'99 Cup. We compared the results of the proposed method and those of the existing ensemble methods that periodically build new classifiers for an ensemble. In the synthetic streaming data, the proposed method produced average 14.1% higher classification accuracy, and the number of new classifiers reduced by average 12.6%. With the intrusion detection data, our method produced similar accuracy to existing methods but used only 0.007% of the labeled streaming data.