Partially supervised sense disambiguation by learning sense number from tagged and untagged corpora

  • Authors:
  • Zheng-Yu Niu;Dong-Hong Ji;Chew Lim Tan

  • Affiliations:
  • Institute for Infocomm Research, Heng Mui Keng Terrace, Singapore;Institute for Infocomm Research, Heng Mui Keng Terrace, Singapore;National University of Singapore, Singapore

  • Venue:
  • EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised and semi-supervised sense disambiguation methods will mis-tag the instances of a target word if the senses of these instances are not defined in sense inventories or there are no tagged instances for these senses in training data. Here we used a model order identification method to avoid the misclassification of the instances with undefined senses by discovering new senses from mixed data (tagged and untagged corpora). This algorithm tries to obtain a natural partition of the mixed data by maximizing a stability criterion defined on the classification result from an extended label propagation algorithm over all the possible values of the number of senses (or sense number, model order). Experimental results on SENSEVAL-3 data indicate that it outperforms SVM, a one-class partially supervised classification algorithm, and a clustering based model order identification algorithm when the tagged data is incomplete.