Performance thresholding in practical text classification

  • Authors:
  • Hinrich Schütze;Emre Velipasaoglu;Jan O. Pedersen

  • Affiliations:
  • Universität Stuttgart, Germany;Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA

  • Venue:
  • CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In practical classification, there is often a mix of learnable and unlearnable classes and only a classifier above a minimum performance threshold can be deployed. This problem is exacerbated if the training set is created by active learning. The bias of actively learned training sets makes it hard to determine whether a class has been learned. We give evidence that there is no general and efficient method for reducing the bias and correctly identifying classes that have been learned. However, we characterize a number of scenarios where active learning can succeed despite these difficulties.