Certainty-based active learning for sampling imbalanced datasets

  • Authors:
  • Juihsi Fu;Singling Lee

  • Affiliations:
  • -;-

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

Active learning is to learn an accurate classifier within as few queried labels as possible. For practical applications, we propose a Certainty-Based Active Learning (CBAL) algorithm to solve the imbalanced data classification problem in active learning. Without being affected by irrelevant samples which might overwhelm the minority class, the importance of each unlabeled sample is carefully measured within an explored neighborhood. For handling the agnostic case, IWAL-ERM is integrated into our approach without costs. Thus our CBAL is designed to determine the query probability within an explored neighborhood for each unlabeled sample. The potential neighborhood is incrementally explored, and there is no need to define the neighborhood size in advance. In our theoretical analysis, it is presented that CBAL has a polynomial label query improvement over passive learning. And the experimental results on synthetic and real-world datasets show that, CBAL has the ability of identifying informative samples and dealing with the imbalanced data classification problem in active learning.