Hybrid word sense disambiguation using language resources for transliteration of Arabic numerals in Korean

  • Authors:
  • Minho Kim;Youngim Jung;Hyuk-Chul Kwon

  • Affiliations:
  • Pusan National University, Busan, Korea;KISTI, Daejeon, Korea;Pusan National University, Busan, Korea

  • Venue:
  • Proceedings of the 2009 International Conference on Hybrid Information Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The high frequency of the use of Arabic numerals in informative texts and their multiple senses and readings deteriorate the accuracy of TTS systems. This paper presents a hybrid word sense disambiguation method exploiting a tagged corpus and a Korean wordnet, KorLex 1.0, for the correct and efficient conversion of Arabic numerals into Korean phonemes according to their senses. Individual contextual features are extracted from the tagged corpus and are grouped in order to determine the sense of Arabic numerals. Least upper bound synsets among common hypernyms of contextual features were obtained from the KorLex hierarchy, and they were used as semantic categories of the contextual features of Arabic numerals. The semantic classes were trained to classify the meaning and the reading of Arabic numerals using decision tree and to compose grapheme-to-phoneme rules for an automatic transliteration system for Arabic numerals. The proposed system outperforms the customized TTS systems by 3.9%--20.3%.