SERA: selectively recursive approach towards nonstationary imbalanced stream data mining

  • Authors:
  • Sheng Chen;Haibo He

  • Affiliations:
  • Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ;Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ

  • Venue:
  • IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Recent years have witnessed an incredibly increasing interest in the topic of stream data mining. Despite the great success having been achieved, current approaches generally assume that the class distribution of the stream data is relatively balanced. However, in applications such as network intrusion detection, credit fraud detection, spam classification, and many others, the class distribution is mostly imbalanced and the cost for misclassifying a minority example is very expensive. Concept drifts is an unavoidable issue for stream data mining research, which is even more difficult to handle when the classifier has to learn from an imbalanced data stream whose target concept keeps drifting all the time. In this article, we propose a selectively recursive approach (SERA) to deal with the problem of learning from nonstationary imbalanced data streams. By selectively absorbing the previously received minority examples into the current training data chunk and potentially assigning the sampling probabilities proportionally to the majority and minority examples, SERA can alleviate the difficulty confronted by the conventional stream data mining methods when they have to learn from the nonstationary imbalanced data streams. Experiments performed on the synthetic datasets show that compared to the existing approaches, our approach is competitive in the general assessment metrics and is capable of significantly performance improvement in predicting minority instances.