Mining Recurring Concept Drifts with Limited Labeled Streaming Data

Authors:
Peipei Li;Xindong Wu;Xuegang Hu
Affiliations:
Hefei University of Technology;Hefei University of Technology and University of Vermont;Hefei University of Technology
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST)
Year:
2012

Citing 33
Cited 0

Tracking Drifting Concepts By Minimizing Disagreements

Machine Learning - Special issue on computational learning theory
Learning in the presence of concept drift and hidden contexts

Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Extracting Hidden Context

Machine Learning - Special issue on context sensitivity and concept drift
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Learning from Noisy Data

Machine Learning
Learning From Noisy Examples

Machine Learning
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Semi-Supervised Learning on Riemannian Manifolds

Machine Learning
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Clustering-training for Data Stream Mining

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Semisupervised Regression with Cotraining-Style Algorithms

IEEE Transactions on Knowledge and Data Engineering
Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Issues in evaluation of stream learning algorithms

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Semi-supervised Gaussian process classifiers

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Tracking recurring contexts using ensemble classifiers: an application to email filtering

Knowledge and Information Systems
A RANDOM DECISION TREE ENSEMBLE FOR MINING CONCEPT DRIFTS FROM NOISY DATA STREAMS

Applied Artificial Intelligence
MOA: Massive Online Analysis

The Journal of Machine Learning Research
Semi-supervised learning by disagreement

Knowledge and Information Systems
A Double-Window-Based Classification Algorithm for Concept Drifting Data Streams

GRC '10 Proceedings of the 2010 IEEE International Conference on Granular Computing
Improving the performance of data stream classifiers by mining recurring contexts

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
ACE: adaptive classifiers-ensemble system for concept-drifting environments

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real-world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring concepts in a data stream environment with unlabeled data, and this challenge has received little attention from the research community. Motivated by this challenge, this article focuses on the problem of recurring contexts in streaming environments with limited labeled data. We propose a semi-supervised classification algorithm for data streams with REcurring concept Drifts and Limited LAbeled data, called REDLLA, in which a decision tree is adopted as the classification model. When growing a tree, a clustering algorithm based on k-means is installed to produce concept clusters and unlabeled data are labeled in the method of majority-class at leaves. In view of deviations between history and new concept clusters, potential concept drifts are distinguished and recurring concepts are maintained. Extensive studies on both synthetic and real-world data confirm the advantages of our REDLLA algorithm over three state-of-the-art online classification algorithms of CVFDT, DWCDS, and CDRDT and several known online semi-supervised algorithms, even in the case with more than 90% unlabeled data.