C4.5: programs for machine learning
C4.5: programs for machine learning
Forgetting Exceptions is Harmful in Language Learning
Machine Learning - Special issue on natural language learning
Information Retrieval
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Constrained Sequence Classification for Lexical Disambiguation
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Automatic diacritic restoration for resource-scarce languages
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Special speech synthesis for social network websites
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Statistical unicodification of African languages
Language Resources and Evaluation
Exploring new languages with HAIRCUT at CLEF 2005
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Hi-index | 0.00 |
This paper presents a method for diacritics restoration based on learning mechanisms that act at letter level. The method requires no additional tagging tools or resources other than raw text, which makes it independent of the language, and particularly appealing for languages for which there are few resources available. The algorithm was evaluated on four different languages, namely Czech, Hungarian, Polish and Romanian, and an average accuracy of over 98% was observed.