A Low-Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution

Authors:
Peng Wang;Haixun Wang;Xiaochen Wu;Wei Wang;Baile Shi
Affiliations:
-;-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 0
Cited 5

Incremental learning in nonstationary environments with controlled forgetting

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Active learning from stream data using optimal weight classifier ensemble

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Mining data streams with concept drifts using genetic algorithm

Artificial Intelligence Review
2011 Special Issue: A just-in-time adaptive classification system based on the intersection of confidence intervals rule

Neural Networks
Data stream classification with artificial endocrine system

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications track streaming data for actionable alerts, which may include, for example, network intrusions, transaction frauds, biosurveilence abnormalities, etc. Some stream classification models are built for this purpose. Due to concept drifts, maintaining a model's up-to-dateness has become one of the most challenging tasks in mining data streams. State of the art approaches, including both the incrementally updated classifiers and the ensemble classifiers, have proved that model update is a very costly process. In this paper, we show that reducing model granularity reduces update cost, as models of fine granularity enable us to efficiently pinpoint local components in the model that are affected by the concept drift. It also enables us to derive new model components to reflect the current data distribution, thus avoiding expensive updates on a global scale. Furthermore, those actionable alerts being monitored are usually rare occurring. The existing stream classifiers cannot handle this problem. We address this problem and show that the low granularity classifier handles rare events on stream data with ease. Experiments on real and synthetic data show that our approach is able to maintain good prediction accuracy at a fraction of model updating cost of state of the art approaches.