Classifying evolving data streams with partially labeled data

Authors:
Hanen Borchani;Pedro Larraòaga;Concha Bielza
Affiliations:
Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain;Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain;Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain
Venue:
Intelligent Data Analysis
Year:
2011

Citing 21
Cited 0

Steps toward artificial intelligence

Computers & thought
Learning in the presence of concept drift and hidden contexts

Machine Learning
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Entropy-based Concept Shift Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Decision trees for mining data streams

Intelligent Data Analysis
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Dynamic integration of classifiers for handling concept drift

Information Fusion
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
Categorizing and mining concept drifting data streams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Classifying Data Streams with Skewed Class Distributions and Concept Drifts

IEEE Internet Computing
A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
New ensemble methods for evolving data streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
On the window size for classification in changing environments

Intelligent Data Analysis
SERA: selectively recursive approach towards nonstationary imbalanced stream data mining

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Detecting concept drift using statistical testing

DS'07 Proceedings of the 10th international conference on Discovery science
Change detection in learning histograms from data streams

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Learning with local drift detection

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, several approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most are based on supervised classification algorithms assuming that true labels are immediately and entirely available in the data streams. Unfortunately, such an assumption is often violated in real-world applications given that it is expensive or because it takes a long time to obtain all true labels. To deal with this problem, we propose in this paper a new semi-supervised approach for handling concept-drifting data streams containing both labeled and unlabeled instances. First, contrary to existing approaches, we monitor three possible kinds of drift: feature, conditional or dual drift. Drift detection is based on a hypothesis test comparing Kullback-Leibler divergence between old and recent data, whose distribution under the null hypothesis of coming from the same distribution is approximated via a bootstrap method. Then, if any drift occurs, a new classifier is learned from the recent data using the EM algorithm; otherwise, the current classifier is left unchanged. Our approach is so general that it can be applied to different classification models. Experimental studies, using the naive Bayes classifier and logistic regression, on both synthetic and real-world data sets demonstrate that our approach performs well.