Learning to identify unexpected instances in the test set

  • Authors:
  • Xiao-Li Li;Bing Liu;See-Kiong Ng

  • Affiliations:
  • Institute for Infocomm Research, Singapore;Department of Computer Science, University of Illinois at Chicago, Chicago, IL;Institute for Infocomm Research, Singapore

  • Venue:
  • IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional classification involves building a classifier using labeled training examples from a set of predefined classes and then applying the classifier to classify test instances into the same set of classes. In practice, this paradigm can be problematic because the test data may contain instances that do not belong to any of the previously defined classes. Detecting such unexpected instances in the test set is an important issue in practice. The problem can be formulated as learning from positive and unlabeled examples (PU learning). However, current PU learning algorithms require a large proportion of negative instances in the unlabeled set to be effective. This paper proposes a novel technique to solve this problem in the text classification domain. The technique first generates a single artificial negative document AN. The sets P and {AN} are then used to build a naïve Bayesian classifier. Our experiment results show that this method is significantly better than existing techniques.