Classifying evolving data streams with partially labeled data

  • Authors:
  • Hanen Borchani;Pedro Larraòaga;Concha Bielza

  • Affiliations:
  • Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain;Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain;Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, several approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most are based on supervised classification algorithms assuming that true labels are immediately and entirely available in the data streams. Unfortunately, such an assumption is often violated in real-world applications given that it is expensive or because it takes a long time to obtain all true labels. To deal with this problem, we propose in this paper a new semi-supervised approach for handling concept-drifting data streams containing both labeled and unlabeled instances. First, contrary to existing approaches, we monitor three possible kinds of drift: feature, conditional or dual drift. Drift detection is based on a hypothesis test comparing Kullback-Leibler divergence between old and recent data, whose distribution under the null hypothesis of coming from the same distribution is approximated via a bootstrap method. Then, if any drift occurs, a new classifier is learned from the recent data using the EM algorithm; otherwise, the current classifier is left unchanged. Our approach is so general that it can be applied to different classification models. Experimental studies, using the naive Bayes classifier and logistic regression, on both synthetic and real-world data sets demonstrate that our approach performs well.