Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
On the parameter space of generative lexicalized statistical parsing models
On the parameter space of generative lexicalized statistical parsing models
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Maximum entropy based restoration of Arabic diacritics
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Simultaneous tokenization and part-of-speech tagging for Arabic without a morphological analyzer
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
A new approach to lexical disambiguation of Arabic text
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Better Arabic parsing: baselines, evaluations, and analysis
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Hi-index | 0.00 |
Research on the problem of morphological disambiguation of Arabic has noted that techniques developed for lexical disambiguation in English do not easily transfer over, since the affixation present in Arabic creates a very different tag set than for English, encoding both inflectional morphology and more complex tokenization sequences. This work takes a new approach to this problem based on a distinction between the open-class and closed-class categories of tokens, which differ both in their frequencies and in their possible morphological affixations. This separation simplifies the morphological analysis problem considerably, making it possible to use a Conditional Random Field model for joint tokenization and “core” part-of-speech tagging of the open-class items, while the closed-class items are handled by regular expressions. This work is therefore situated between data-driven approaches and those that use a morphological analyzer. For the tasks of tokenization and core part-of-speech tagging, the resulting system outperforms, on the given test set, a system that incorporates a morphological analyzer. We also evaluate the effects of the differences on parser performance when the tagger output is used for parser input.