Selecting the best feature set for Thai word sense disambiguation using support vector machines

  • Authors:
  • Chutchada Nusai;Yoshimi Suzuki;Haruaki Yamazaki

  • Affiliations:
  • Department of Computer Science and Media Engineering, Faculty of Engineering, University of Yamanashi, Japan;Department of Computer Science and Media Engineering, Faculty of Engineering, University of Yamanashi, Japan;Department of Computer Science and Media Engineering, Faculty of Engineering, University of Yamanashi, Japan

  • Venue:
  • AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a method of selecting the best feature set for Thai word sense disambiguation by using Support Vector Machines (SVM) algorithm. This research focuses on Thai verb sense disambiguation. Many approaches have been employed to resolve the word sense ambiguity with a reasonable degree of accuracy. Our research focuses on the corpus-based approach that employs a supervised machine learning method for disambiguation. The machine learning method has the ability of selecting the suitable feature. In order to find the best feature set for resolving Thai word sense ambiguity, our method uses characteristics of the words co-occur with the ambiguous word in sentences extracted from Thai corpus for determining sense of the ambiguous word. The ambiguous words are evaluated with 30 feature sets under "word" "part of speech (POS)" and "semantic concept (SM)" features. The result shows that "word & SM" feature set gives the best result as the best feature set of sense indicator and the accuracy rate is approximately 90-96%.