POS-tagger for English-Vietnamese bilingual corpus

Authors:
Dinh Dien;Hoang Kiem
Affiliations:
Vietnam National University of HCMC, HCM City, Vietnam;Vietnam National University of HCMC, HCM City
Venue:
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Year:
2003

Citing 8
Cited 4

A corpus-based approach to language learning

A corpus-based approach to language learning
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Part-of-speech tagging with neural networks

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Building a training corpus for word sense disambiguation in English-to-Vietnamese machine translation

COLING-MTIA '02 Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Using target-language information to train part-of-speech taggers for machine translation

Machine Translation
Text Categorization for Vietnamese Documents

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Vietnamese Document Representation and Classification

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Ripple down rules for part-of-speech tagging

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Corpus-based Natural Language Processing (NLP) tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for unpopular languages (e.g. Vietnamese) are at a deadlock due to absence of annotated training data for these languages. Furthermore, hand-annotation of even reasonably well-determined features such as part-of-speech (POS) tags has proved to be labor intensive and costly. In this paper, we suggest a solution to partially overcome the annotated resource shortage in Vietnamese by building a POS-tagger for an automatically word-aligned English-Vietnamese parallel Corpus (named EVC). This POS-tagger made use of the Transformation-Based Learning (or TBL) method to bootstrap the POS-annotation results of the English POS-tagger by exploiting the POS-information of the corresponding Vietnamese words via their word-alignments in EVC. Then, we directly project POS-annotations from English side to Vietnamese via available word alignments. This POS-annotated Vietnamese corpus will be manually corrected to become an annotated training data for Vietnamese NLP tasks such as POS-tagger, Phrase-Chunker, Parser, Word-Sense Disambiguator, etc.