On the use of words and n-grams for Chinese information retrieval
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Diacritics Restoration: Learning from Letters versus Learning from Words
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A Hybrid Approach to Word Segmentation of Vietnamese Texts
Language and Automata Theory and Applications
Constrained Sequence Classification for Lexical Disambiguation
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Automatic diacritic restoration for resource-scarce languages
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Hi-index | 0.00 |
In this paper, we present some approaches to diacritics restoration in Vietnamese, based on letters and syllables. Experiments with language-specified feature selection are conducted to evaluate contribution of different types of feature. Experimental results reveal that combination of Adaboost and C4.5, using letter-based feature set, achieves 94.7% accuracy, which is competitive with other systems for diacritics restoration in Vietnamese. Test data for diacritics restoration task in Vietnamese could be freely collected with simple preprocessing, whereas large test data for many natural language processing tasks in Vietnamese is lack. So, diacritic restoration could be used as an application-driven evaluation framework for lexical disambiguation tasks.