Foundations of statistical natural language processing
Foundations of statistical natural language processing
Coping with ambiguity and unknown words through probabilistic models
Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Language model based arabic word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Chinese and Japanese word segmentation using word-level and character-level information
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic tagging of Arabic text: from raw text to base phrase chunks
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Choosing an optimal architecture for segmentation and POS-tagging of modern Hebrew
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Developing an Arabic treebank: methods, guidelines, procedures, and tools
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
A probabilistic morphological analyzer for Syriac
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
An efficient part-of-speech tagger for arabic
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Hi-index | 0.00 |
We propose an enhanced Part-of-Speech (POS) tagger of Semitic languages that treats Modern Standard Arabic (henceforth Arabic) and Modern Hebrew (henceforth Hebrew) using the same probabilistic model and architectural setting. We start out by porting an existing Hidden Markov Model POS tagger for Hebrew to Arabic by exchanging a morphological analyzer for Hebrew with Buckwalter's (2002) morphological analyzer for Arabic. This gives state-of-the-art accuracy (96.12%), comparable to Habash and Rambow's (2005) analyzer-based POS tagger on the same Arabic datasets. However, further improvement of such analyzer-based tagging methods is hindered by the incomplete coverage of standard morphological analyzer (Bar Haim et al., 2005). To overcome this coverage problem we supplement the output of Buckwalter's analyzer with synthetically constructed analyses that are proposed by a model which uses character information (Diab et al., 2004) in a way that is similar to Nakagawa's (2004) system for Chinese and Japanese. A version of this extended model that (unlike Nakagawa) incorporates synthetically constructed analyses also for known words achieves 96.28% accuracy on the standard Arabic test set.