Certainty-Enhanced Active Learning for Improving Imbalanced Data Classification

  • Authors:
  • Jui Hsi Fu;Sing Ling Lee

  • Affiliations:
  • -;-

  • Venue:
  • ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In active learning algorithms, informative samples are usually queried for true labels according to the disagreement of existing hypotheses. However we observed that, when the streaming dataset has skewed class membership, the imbalanced data classification problem is caused in active learning. The Minority class is overwhelmed by the majority class in generating the hypotheses. In this paper, for each unlabeled sample we propose to utilize only local behavior in the certainty-enhanced neighborhood, rather than the entire dataset, to generate the error minimization hypotheses. Consequently, our proposed method enhances the prediction of hypotheses and is able to determine the query probabilities properly. In our experiments, synthetic and real-world datasets are used for presenting the effectiveness of our active learning approach. It is shown that the proposed approach decreases the probability of querying a certain (majority) sample and has the ability of dealing with the imbalanced data classification problem in active learning.