Mining Data Streams with Labeled and Unlabeled Training Examples

Authors:
Peng Zhang;Xingquan Zhu;Li Guo
Affiliations:
-;-;-
Venue:
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Year:
2009

Citing 0
Cited 10

SKIF: a data imputation framework for concept drifting data streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Robust ensemble learning for mining noisy data streams

Decision Support Systems
Active learning from stream data using optimal weight classifier ensemble

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Cloud-based malware detection for evolving data streams

ACM Transactions on Management Information Systems (TMIS)
Batch weighted ensemble for mining data streams with concept drift

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Active learning with evolving streaming data

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Predictive Data Stream Filtering

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
A framework for application-driven classification of data streams

Neurocomputing
Semi-supervised ensemble learning of data streams in the presence of concept drift

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a framework to build prediction models from data streams which contain both labeled and unlabeled examples. We argue that due to the increasing data collection ability but limited resources for labeling, stream data collected at hand may only have a small number of labeled examples, whereas a large portion of data remain unlabeled but can be beneficial for learning. Unleashing the full potential of the unlabeled instances for stream data mining is, however, a significant challenge, consider that even fully labeled data streams may suffer from the concept drifting, and inappropriate uses of the unlabeled samples may only make the problem even worse. To build prediction models, we first categorize the stream data into four different categories, each of which corresponds to the situation where concept drifting may or may not exist in the labeled and unlabeled data. After that, we propose a relational k-means based transfer semi-supervised SVM learning framework (RK-TS3VM), which intends to leverage labeled and unlabeled samples to build prediction models. Experimental results and comparisons on both synthetic and real-world data streams demonstrate that the proposed framework is able to help build prediction models more accurate than other simple approaches can offer.