Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Systematic data selection to mine concept-drifting data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining proactive and reactive predictions for data streams
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Using additive expert ensembles to cope with concept drift
ICML '05 Proceedings of the 22nd international conference on Machine learning
A Framework for On-Demand Classification of Evolving Data Streams
IEEE Transactions on Knowledge and Data Engineering
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Stop Chasing Trends: Discovering High Order Models in Evolving Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Active learning with evolving streaming data
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Semi-supervised ensemble learning of data streams in the presence of concept drift
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Hi-index | 0.00 |
This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model would be built in these scenarios, considering that the data stream also has evolving nature? In our previous work we applied semi-supervised clustering to build classification models using limited amount of labeled training data. However, it assumed that the data to be labeled should be chosen randomly. In our current work, we relax this assumption, and propose a label propagation framework for data streams that can build good classification models even if the data are not labeled randomly. Comparison with state-of-the-art stream classification techniques on synthetic and benchmark real data proves the effectiveness of our approach.