Word sense disambiguation of thai language with unsupervised learning

  • Authors:
  • Sunee Pongpinigpinyo;Wanchai Rivepiboon

  • Affiliations:
  • Computer Engineering Department, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand;Computer Engineering Department, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand

  • Venue:
  • KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many approach strategies can be employed to resolve word sense ambiguity with a reasonable degree of accuracy. These strategies are: knowledge-based, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. We report experiments on two Thai polysemous words, namely Unknown XML node MediaObject /hua4/ and Unknown XML node MediaObject /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation. Our approach performs better than a baseline system, which picks the most frequent sense.