Adaptive methods for classification in arbitrarily imbalanced and drifting data streams

  • Authors:
  • Ryan N. Lichtenwalter;Nitesh V. Chawla

  • Affiliations:
  • The University of Notre Dame, Notre Dame, IN;The University of Notre Dame, Notre Dame, IN

  • Venue:
  • PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Streaming data is pervasive in a multitude of data mining applications. One fundamental problem in the task of mining streaming data is distributional drift over time. Streams may also exhibit high and varying degrees of class imbalance, which can further complicate the task. In scenarios like these, class imbalance is particularly difficult to overcome and has not been as thoroughly studied. In this paper, we comprehensively consider the issues of changing distributions in conjunction with high degrees of class imbalance in streaming data. We propose new approaches based on distributional divergence and meta-classification that improve several performance metrics often applied in the study of imbalanced classification. We also propose a new distance measure for detecting distributional drift and examine its utility in weighting ensemble base classifiers. We employ a sequential validation framework, which we believe is the most meaningful option in the context of streaming imbalanced data.