Classifying Data Streams with Skewed Class Distributions and Concept Drifts

  • Authors:
  • Jing Gao;Bolin Ding;Wei Fan;Jiawei Han;Philip S. Yu

  • Affiliations:
  • University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign;IBM T.J. Watson Research Center;University of Illinois, Urbana-Champaign;University of Illinois, Chicago

  • Venue:
  • IEEE Internet Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification is an important data analysis tool that uses a model built from historical data to predict class labels for new observations. More and more applications are featuring data streams, rather than finite stored data sets, which are a challenge for traditional classification algorithms. Concept drifts and skewed distributions, two common properties of data stream applications, make the task of learning in streams difficult. The authors aim to develop a new approach to classify skewed data streams that uses an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives.