Selecting the best feature set for Thai word sense disambiguation using support vector machines

Authors:
Chutchada Nusai;Yoshimi Suzuki;Haruaki Yamazaki
Affiliations:
Department of Computer Science and Media Engineering, Faculty of Engineering, University of Yamanashi, Japan;Department of Computer Science and Media Engineering, Faculty of Engineering, University of Yamanashi, Japan;Department of Computer Science and Media Engineering, Faculty of Engineering, University of Yamanashi, Japan
Venue:
AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Year:
2007

Citing 3
Cited 1

Support-Vector Networks

Machine Learning
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Supervised sense tagging using support vector machines

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

Method based on EM algorithm for estimating word translation probabilities in Thai: english machine translation

DNCOCO'07 Proceedings of the 9th WSEAS International Conference on Data Networks, Communications, Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method of selecting the best feature set for Thai word sense disambiguation by using Support Vector Machines (SVM) algorithm. This research focuses on Thai verb sense disambiguation. Many approaches have been employed to resolve the word sense ambiguity with a reasonable degree of accuracy. Our research focuses on the corpus-based approach that employs a supervised machine learning method for disambiguation. The machine learning method has the ability of selecting the suitable feature. In order to find the best feature set for resolving Thai word sense ambiguity, our method uses characteristics of the words co-occur with the ambiguous word in sentences extracted from Thai corpus for determining sense of the ambiguous word. The ambiguous words are evaluated with 30 feature sets under "word" "part of speech (POS)" and "semantic concept (SM)" features. The result shows that "word & SM" feature set gives the best result as the best feature set of sense indicator and the accuracy rate is approximately 90-96%.