A maximum entropy approach to natural language processing
Computational Linguistics
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Real-time automatic insertion of accents in French text
Natural Language Engineering
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Letter level learning for language independent diacritics restoration
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Maximum entropy based restoration of Arabic diacritics
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Interactive information extraction with constrained conditional random fields
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Arabic diacritization using weighted finite-state transducers
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Automatic diacritic restoration for resource-scarce languages
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Diacritics restoration in vietnamese: letter based vs. syllable based model
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Hi-index | 0.00 |
This paper addresses lexical ambiguity with focus on a particular problem known as accent prediction, in that given an accentless sequence, we need to restore correct accents. This can be modelled as a sequence classification problem for which variants of Markov chains can be applied. Although the state space is large (about the vocabulary size), it is highly constrained when conditioned on the data observation. We investigate the application of several methods, including Powered Product-of-N -grams, Structured Perceptron and Conditional Random Fields (CRFs). We empirically show in the Vietnamese case that these methods are fairly robust and efficient. The second-order CRFs achieve best results with about 94% term accuracy.