A Semi-supervised Learning Method for Vietnamese Part-of-Speech Tagging

Authors:
Le Minh Nguyen;Bach Ngo Xuan;Cuong Nguyen Viet;Minh Pham Quang Nhat;Akira Shimazu
Affiliations:
-;-;-;-;-
Venue:
KSE '10 Proceedings of the 2010 Second International Conference on Knowledge and Systems Engineering
Year:
2010

Citing 0
Cited 1

Using wiktionary to improve lexical disambiguation in multiple languages

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a semi-supervised learning method for Vietnamese part of speech tagging. We take into account two powerful tagging models including Conditional Random Fields (CRFs)and the Guided Online-Learning models (GLs) as base learning models. We then propose a semi-supervised learning tagging model for both CRFs and GLs methods. The main idea is to use of a word-cluster model as an associate source for enrich the feature space of discriminate learning models for both training and decoding processes. Experimental results on Vietnamese Tree-bank data (VTB) showed that the proposed method is effective. Our best model achieved accuracy of 94.10\% when tested on VTB, and 92.60\% an independent test.