An efficient part-of-speech tagger for arabic

Authors:
Selçuk Köprü
Affiliations:
Teknoloji Yazılımevi, Ltd., METU Technopolis, Ankara, TR
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Year:
2011

Citing 9
Cited 0

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Morphological tagging: data vs. dictionaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Syntactic reordering for English-Arabic phrase-based machine translation

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
A POS-based model for long-range reorderings in SMT

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Improved Arabic base phrase chunking with a new enriched POS tag set

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Smoothing a lexicon-based POS tagger for Arabic and Hebrew

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an efficient part-of-speech (POS) tagger for Arabic which is based on a Hidden Markow Model. We explore different enhancements to improve the baseline system. Despite the morphological complexity of Arabic our approach is a data driven approach and does not utilize any morphological analyzer or a lexicon as many other Arabic POS taggers. This makes our approach simple, very efficient and valuable to be used in real-life applications and the obtained accuracy results are still comparable to other Arabic POS taggers. In the experiments, we also thoroughly investigate different aspects of Arabic POS tagging including tag sets, prefix and suffix analyses which were not examined in detail before. Our part-of-speech tagger achieves an accuracy of 95.57% on a standard tagset for Arabic. A detailed error analysis is provided for a better evaluation of the system. We also applied the same approach on different languages like Farsi and German to show the language independent aspect of the approach. Accuracy rates on these languages are also provided.