Construction of Vietnamese corpora for named entity recognition

Authors:
Pham T. X. Thao;T. Q. Tri;Ai Kawazoe;Dien Dinh;Nigel Collier
Affiliations:
University of Information Technology - VNU of HCMC Vietnam;University of Information Technology - VNU of HCMC Vietnam;National Institute of Informatics, Tokyo, Japan;University of Natural Sciences - VNU of HCMC Vietnam;National Institute of Informatics, Tokyo, Japan
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 1
Cited 3

Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1

Named entity recognition for Vietnamese

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

ACM Transactions on Asian Language Information Processing (TALIP)
VNLP: an open source framework for Vietnamese natural language processing

Proceedings of the Fourth Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to build an automatic named entity recognition (NER) system using a machine learning approach, a large tagged corpus is widely seen as one necessary knowledge resource. Nevertheless, manual construction is time consuming, labor intensive and expensive. Building NER corpora for European languages has been extensively studied while some less-studied languages such as Vietnamese have not yet received much attention. This paper describes construction of a Vietnamese corpus, Vietnamese guidelines for annotators and a tagging tool that we make publicly available. We report on a comparison with the English named entity (NE) corpus in our multilingual NER system.