Practical Word-Sense Disambiguation Using Co-occurring Concept Codes

  • Authors:
  • Youjin Chung;Jong-Hyeok Lee

  • Affiliations:
  • Div. of Electrical and Computer Engineering, POSTECH and Advanced Information Technology Research Center (AITre), Pohang, Republic of Korea 790-784;Div. of Electrical and Computer Engineering, POSTECH and Advanced Information Technology Research Center (AITre), Pohang, Republic of Korea 790-784

  • Venue:
  • Machine Translation
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most previous corpus-based approaches to the resolution of word-sense ambiguity have collected lexical information from the context of the word to be disambiguated. However, they suffer from the problem of data sparseness. To address this problem, this paper proposes a disambiguation method using co-occurring concept codes (CCCs). The use of concept-code features and concept-code generalization effectively alleviate the data sparseness problem and also reduce the number of features to a practical size without any loss in system performance. We prove the effectiveness of the CCC features and the concept-code generalization by experimental evaluations. The proposed disambiguation method is applied to a Korean-to-Japanese MT system that experimented with various machine-learning techniques. In a lexical sample evaluation, our CCC-based method achieved a precision of 82.00%, with an 11.83% improvement over the baseline. Also, it achieved a precision of 83.51% in an experiment on real text, which shows that our proposed method is very useful for practical MT systems.