Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A new approach to lexical disambiguation of Arabic text
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Exploiting Separation of Closed-Class Categories for Arabic Tokenization and Part-of-Speech Tagging
ACM Transactions on Asian Language Information Processing (TALIP)
Developing a competitive HMM arabic POS tagger using small training corpora
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Hi-index | 0.00 |
We describe an approach to simultaneous tokenization and part-of-speech tagging that is based on separating the closed and open-class items, and focusing on the likelihood of the possible stems of the openclass words. By encoding some basic linguistic information, the machine learning task is simplified, while achieving state-of-the-art tokenization results and competitive POS results, although with a reduced tag set and some evaluation difficulties.