Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
An Arabic morphological system
IBM Systems Journal
A maximum entropy approach to natural language processing
Computational Linguistics
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Multitiered nonlinear morphology using multitape finite automata: a case study on Syriac and Arabic
Computational Linguistics - Special issue on finite-state methods in NLP
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
An HMM approach to vowel restoration in Arabic and Hebrew
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Conditional structure versus conditional estimation in NLP models
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Maximum entropy based restoration of Arabic diacritics
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Arabic diacritic restoration approach based on maximum entropy models
Computer Speech and Language
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Arabic diacritization through full morphological tagging
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Arabic diacritization using weighted finite-state transducers
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Automatic diacritization of Arabic for acoustic modeling in speech recognition
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
A probabilistic morphological analyzer for Syriac
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Part of speech tagging for arabic
Natural Language Engineering
Hi-index | 0.00 |
We are interested in diacritizing Semitic languages, especially Syriac, using only dia-critized texts. Previous methods have required the use of tools such as part-of-speech taggers, segmenters, morphological analyzers, and linguistic rules to produce state-of-the-art results. We present a low-resource, data-driven, and language-independent approach that uses a hybrid word- and consonant-level conditional Markov model. Our approach rivals the best previously published results in Arabic (15% WER with case endings), without the use of a morphological analyzer. In Syriac, we reduce the WER over a strong baseline by 30% to achieve a WER of 10.5%. We also report results for Hebrew and English.