Mining concept-drifting data streams containing labeled and unlabeled instances

Authors:
Hanen Borchani;Pedro Larrañaga;Concha Bielza
Affiliations:
Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain;Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain;Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Year:
2010

Citing 13
Cited 2

Steps toward artificial intelligence

Computers & thought
Learning in the presence of concept drift and hidden contexts

Machine Learning
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Entropy-based Concept Shift Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Dynamic integration of classifiers for handling concept drift

Information Fusion
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
Classifying Data Streams with Skewed Class Distributions and Concept Drifts

IEEE Internet Computing
A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Learning with local drift detection

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications

Active learning with evolving streaming data

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Semi-supervised ensemble learning of data streams in the presence of concept drift

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, mining data streams has attracted significant attention and has been considered as a challenging task in supervised classification. Most of the existing methods dealing with this problem assume the availability of entirely labeled data streams. Unfortunately, such assumption is often violated in real-world applications given that obtaining labels is a time-consuming and expensive task, while a large amount of unlabeled instances are readily available. In this paper, we propose a new approach for handling concept-drifting data streams containing labeled and unlabeled instances. First, we use KL divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classifier is learned using the EM algorithm; otherwise, the current classifier is kept unchanged. Our approach is general so that it can be applied with different classification models. Experiments performed with naive Bayes and logistic regression, on two benchmark datasets, show the good performance of our approach using only limited amounts of labeled instances.