SERA: selectively recursive approach towards nonstationary imbalanced stream data mining

Authors:
Sheng Chen;Haibo He
Affiliations:
Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ;Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 23
Cited 4

Learning in the presence of concept drift and hidden contexts

Machine Learning
Bagging predictors

Machine Learning
Experimental comparisons of online and batch versions of bagging and boosting

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Comparative Study of Cost-Sensitive Boosting Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Exploratory Under-Sampling for Class-Imbalance Learning

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The class imbalance problem: A systematic study

Intelligent Data Analysis
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learn++.NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes

IEEE Transactions on Neural Networks
Learn++: an incremental learning algorithm for supervised neuralnetworks

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A Kernel-Based Two-Class Classifier for Imbalanced Data Sets

IEEE Transactions on Neural Networks
IMORL: Incremental Multiple-Object Recognition and Localization

IEEE Transactions on Neural Networks

Learning in non-stationary environments with class imbalance

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Classifier Ensemble for Imbalanced Data Stream Classification

Proceedings of the CUBE International Information Technology Conference
Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning

Neural Processing Letters
Classifying evolving data streams with partially labeled data

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recent years have witnessed an incredibly increasing interest in the topic of stream data mining. Despite the great success having been achieved, current approaches generally assume that the class distribution of the stream data is relatively balanced. However, in applications such as network intrusion detection, credit fraud detection, spam classification, and many others, the class distribution is mostly imbalanced and the cost for misclassifying a minority example is very expensive. Concept drifts is an unavoidable issue for stream data mining research, which is even more difficult to handle when the classifier has to learn from an imbalanced data stream whose target concept keeps drifting all the time. In this article, we propose a selectively recursive approach (SERA) to deal with the problem of learning from nonstationary imbalanced data streams. By selectively absorbing the previously received minority examples into the current training data chunk and potentially assigning the sampling probabilities proportionally to the majority and minority examples, SERA can alleviate the difficulty confronted by the conventional stream data mining methods when they have to learn from the nonstationary imbalanced data streams. Experiments performed on the synthetic datasets show that compared to the existing approaches, our approach is competitive in the general assessment metrics and is capable of significantly performance improvement in predicting minority instances.