Comparison of three machine-learning methods for Thai part-of-speech tagging

Authors:
Masaki Murata;Qing Ma;Hitoshi Isahara
Affiliations:
Communications Research Laboratory;Communications Research Laboratory;Communications Research Laboratory
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2002

Citing 7
Cited 6

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Statistical Language Learning

Statistical Language Learning
A multi-neuro tagger using variable lengths of contexts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Hybrid neuro and rule-based part of speech taggers

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Bunsetsu identification using category-exclusive rules

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1

Correction of errors in a verb modality corpus for machine translation with a machine-learning method

ACM Transactions on Asian Language Information Processing (TALIP)
Is 1 noun worth 2 adjectives?: measuring relative feature utility

Information Processing and Management: an International Journal
A user-centred corporate acquisition system: a dynamic fuzzy membership functions approach

Decision Support Systems
Machine-learning-based transformation of passive japanese sentences into active by separating training data into each input particle

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Adaptable learning assistant for item bank management

Computers & Education
Recovering "lack of words" in text categorization for item banks

COMPSAC-W'05 Proceedings of the 29th annual international conference on Computer software and applications conference

Quantified Score

Hi-index	0.01

Visualization

Abstract

The elastic-input neuro-tagger and hybrid tagger, combined with a neural network and Brill's error-driven learning, have already been proposed to construct a practical tagger using as little training data as possible. When a small Thai corpus is used for training, these taggers have tagging accuracies of, respectively, 94.4% and 95.5% (accounting only for the ambiguous words that relate to the parts of speech). In this study, in order to construct more accurate taggers, we developed new tagging methods using three different machine-learning approaches: the decision list, maximum entropy, and the support vector machine methods. We then performed tagging experiments using them. Our results show that the support vector machine method has the best precision (96.1%), and that it is capable of improving the accuracy of tagging in the Thai language. The improvement in accuracy was also confirmed by using a statistical test (a sign test). Finally, we examined theoretically all these methods in an effort to determine how the improvements were achieved. We found that the improvements were due to our use of word information, which is helpful for tagging, and a support vector machine that performed well.