Word Sense Disambiguation by Combining Labeled Data Expansion and Semi-Supervised Learning Method

  • Authors:
  • Sanae Fujita;Akinori Fujino

  • Affiliations:
  • Nippon Telegraph and Telephone Corporation;Nippon Telegraph and Telephone Corporation

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Lack of labeled data is one of the severest problems facing word sense disambiguation (WSD). We overcome the problem by proposing a method that combines automatic labeled data expansion (Step 1) and semi-supervised learning (Step 2). The Step 1 and 2 methods are both effective, but their combination yields a synergistic effect. In this article, in Step 1, we automatically extract reliable labeled data from raw corpora using dictionary example sentences, even the infrequent and unseen senses (which are not likely to appear in labeled data). Next, in Step 2, we apply a semi-supervised classifier and achieve an improvement using easy-to-get unlabeled data. In this step, we also show that we can guess even unseen senses. We target a SemEval-2010 Japanese WSD task, which is a lexical sample task. Both Step 1 and Step 2 methods performed better than the best published result (76.4 %). Furthermore, the combined method achieved much higher accuracy (84.2 %). In this experiment, up to 50 % of unseen senses are classified correctly. However, the number of unseen senses are small, therefore, we delete one senses per word and apply our proposed method; the results show that the method is effective and robust even for unseen senses.