Active learning with extremely sparse labeled examples

  • Authors:
  • Shiliang Sun;David R. Hardoon

  • Affiliations:
  • Department of Computer Science and Technology, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China;Data Mining Department, Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, #20-10 Connexis, Singapore

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

In the setting of active learning there exists a general assumption that labeled examples are available for training a classifier, which in turn is used to examine unlabeled data to select the most 'informative' examples for manual labeling. However, in some domain applications there are a limited number of labeled examples available, such as in the most extreme cases of having a single labeled example per category. In these scenarios, the most existing active learning methodologies cannot be directly applied without initially making an assumption on label assignment. In this paper we present a method for finding high-informative examples for manual labeling based on extremely limited labeled data available during training. We propose using canonical correlation analysis to investigate the correlation between different views of the available data and demonstrate that this measure can be used as a selection criterion for the novel application of active learning using only a single labeled example from each class. We demonstrate our method with promising experimental results on text classification, advertisement removal and multi-class image classification tasks.