Quantification and semi-supervised classification methods for handling changes in class distribution

  • Authors:
  • Jack Chongjie Xue;Gary M. Weiss

  • Affiliations:
  • Fordham University, Bronx, NY, USA;Fordham University, Bronx, NY, USA

  • Venue:
  • Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In realistic settings the prevalence of a class may change after a classifier is induced and this will degrade the performance of the classifier. Further complicating this scenario is the fact that labeled data is often scarce and expensive. In this paper we address the problem where the class distribution changes and only unlabeled examples are available from the new distribution. We design and evaluate a number of methods for coping with this problem and compare the performance of these methods. Our quantification-based methods estimate the class distribution of the unlabeled data from the changed distribution and adjust the original classifier accordingly, while our semi-supervised methods build a new classifier using the examples from the new (unlabeled) distribution which are supplemented with predicted class values. We also introduce a hybrid method that utilizes both quantification and semi-supervised learning. All methods are evaluated using accuracy and F-measure on a set of benchmark data sets. Our results demonstrate that our methods yield substantial improvements in accuracy and F-measure.