Grammatical Agreement and Automatic Morphological Disambiguation of Inflectional Languages
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Multitiered nonlinear morphology using multitape finite automata: a case study on Syriac and Arabic
Computational Linguistics - Special issue on finite-state methods in NLP
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Language model based arabic word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Context-based morphological disambiguation with random fields
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Part-of-speech tagging of modern hebrew text
Natural Language Engineering
Identifying semitic roots: Machine learning with linguistic constraints
Computational Linguistics
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Automatic tagging of Arabic text: from raw text to base phrase chunks
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Arabic diacritization through full morphological tagging
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Smoothing a lexicon-based POS tagger for Arabic and Hebrew
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Automatic diacritization for low-resource languages using a hybrid word and consonant CMM
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Is Arabic part of speech tagging feasible without word segmentation?
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
We define a probabilistic morphological analyzer using a data-driven approach for Syriac in order to facilitate the creation of an annotated corpus. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. We introduce novel probabilistic models for segmentation, dictionary linkage, and morphological tagging and connect them in a pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of models with varying amounts of training data and find that with about 34,500 labeled tokens, we can outperform a reasonable baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, our joint model achieves 86.47% accuracy, a 29.7% reduction in error rate over the baseline.