Classifying Data Streams with Skewed Class Distributions and Concept Drifts

Authors:
Jing Gao;Bolin Ding;Wei Fan;Jiawei Han;Philip S. Yu
Affiliations:
University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign;IBM T.J. Watson Research Center;University of Illinois, Urbana-Champaign;University of Illinois, Chicago
Venue:
IEEE Internet Computing
Year:
2008

Citing 0
Cited 7

DynaMMo: mining and summarization of coevolving sequences with missing values

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining concept-drifting data streams containing labeled and unlabeled instances

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Mining data streams with concept drifts using genetic algorithm

Artificial Intelligence Review
An efficient ensemble method for classifying skewed data streams

ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
Classifier Ensemble for Imbalanced Data Stream Classification

Proceedings of the CUBE International Information Technology Conference
Data stream classification with artificial endocrine system

Applied Intelligence
Classifying evolving data streams with partially labeled data

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is an important data analysis tool that uses a model built from historical data to predict class labels for new observations. More and more applications are featuring data streams, rather than finite stored data sets, which are a challenge for traditional classification algorithms. Concept drifts and skewed distributions, two common properties of data stream applications, make the task of learning in streams difficult. The authors aim to develop a new approach to classify skewed data streams that uses an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives.