Learning model order from labeled and unlabeled data for partially supervised classification, with application to word sense disambiguation

  • Authors:
  • Zheng-Yu Niu;Dong-Hong Ji;Chew Lim Tan

  • Affiliations:
  • Institute for Infocomm Research, Mail Box B023, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore;Institute for Infocomm Research, Mail Box B023, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore;Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous partially supervised classification methods can partition unlabeled data into positive examples and negative examples for a given class by learning from positive labeled examples and unlabeled examples, but they cannot further group the negative examples into meaningful clusters even if there are many different classes in the negative examples. Here we proposed an automatic method to obtain a natural partitioning of mixed data (labeled data+unlabeled data) by maximizing a stability criterion defined on classification results from an extended label propagation algorithm over all the possible values of model order (or the number of classes) in mixed data. Our experimental results on benchmark corpora for word sense disambiguation task indicate that this model order identification algorithm with the extended label propagation algorithm as the base classifier outperforms SVM, a one-class partially supervised classification algorithm, and the model order identification algorithm with semi-supervised k-means clustering as the base classifier when labeled data is incomplete.