A maximum entropy approach to natural language processing
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Recovering latent information in treebanks
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Named entity recognition for Vietnamese
ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
Ripple down rules for part-of-speech tagging
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Using wiktionary to improve lexical disambiguation in multiple languages
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Ripple down rules for vietnamese named entity recognition
ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
Building a Vietnamese Lexicon Ontology for Syntactic Parsing and Document Annotation
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
Treebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank construction. Since Vietnamese is an isolating language and has no word delimiter, there are many ambiguities in sentence analysis. We systematically applied a lot of linguistic techniques to handle such ambiguities. Annotators are supported by automatic-labeling tools and a tree-editor tool. Raw texts are extracted from Tuoi Tre (Youth), an online Vietnamese daily newspaper. The current annotation agreement is around 90 percent.