Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew
Computational Linguistics
Statistical Language Learning
Similarity-based estimation of word cooccurrence probabilities
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Arabic finite-state morphological analysis and generation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Maximum entropy based restoration of Arabic diacritics
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multi-agent Based Arabic Speech Recognition
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Arabic diacritic restoration approach based on maximum entropy models
Computer Speech and Language
Applying Finite State Morphology to Conversion Between Roman and Perso-Arabic Writing Systems
Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
Shahmukhi to Gurmukhi transliteration system
COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
A hybrid approach for building Arabic diacritizer
Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Arabic diacritization using weighted finite-state transducers
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Automatic diacritization of Arabic for acoustic modeling in speech recognition
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Automatic diacritization for low-resource languages using a hybrid word and consonant CMM
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Artificial Intelligence Review
Part of speech tagging for arabic
Natural Language Engineering
Hi-index | 0.00 |
Semitic languages pose a problem to Natural Language Processing since most of the vowels are omitted from written prose, resulting in considerable ambiguity at the word level. However, while reading text, native speakers can generally vocalize each word based on their familiarity with the lexicon and the context of the word. Methods for vowel restoration in previous work involving morphological analysis concentrated on a single language and relied on a parsed corpus that is difficult to create for many Semitic languages. We show that Hidden Markov Models are a useful tool for the task of vowel restoration in Semitic languages. Our technique is simple to implement, does not require any language specific knowledge to be embedded in the model and generalizes well to both Hebrew and Arabic. Using a publicly available version of the Bible and the Qur'an as corpora, we achieve a success rate of 86% for restoring the exact vowel pattern in Arabic and 81% in Hebrew. For Hebrew, we also report on 87% success rate for restoring the correct phonetic value of the words.