Comparison of three machine-learning methods for Thai part-of-speech tagging

  • Authors:
  • Masaki Murata;Qing Ma;Hitoshi Isahara

  • Affiliations:
  • Communications Research Laboratory;Communications Research Laboratory;Communications Research Laboratory

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

The elastic-input neuro-tagger and hybrid tagger, combined with a neural network and Brill's error-driven learning, have already been proposed to construct a practical tagger using as little training data as possible. When a small Thai corpus is used for training, these taggers have tagging accuracies of, respectively, 94.4% and 95.5% (accounting only for the ambiguous words that relate to the parts of speech). In this study, in order to construct more accurate taggers, we developed new tagging methods using three different machine-learning approaches: the decision list, maximum entropy, and the support vector machine methods. We then performed tagging experiments using them. Our results show that the support vector machine method has the best precision (96.1%), and that it is capable of improving the accuracy of tagging in the Thai language. The improvement in accuracy was also confirmed by using a statistical test (a sign test). Finally, we examined theoretically all these methods in an effort to determine how the improvements were achieved. We found that the improvements were due to our use of word information, which is helpful for tagging, and a support vector machine that performed well.