Improving arabic part-of-speech tagging through morphological analysis

Authors:
Mohammed Albared;Nazlia Omar;Mohd. Juzaiddin Ab Aziz
Affiliations:
University Kebangsaan Malaysia, Faculty of Information Science and Technology, Department of Computer Science;University Kebangsaan Malaysia, Faculty of Information Science and Technology, Department of Computer Science;University Kebangsaan Malaysia, Faculty of Information Science and Technology, Department of Computer Science
Venue:
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Year:
2011

Citing 8
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Tagging with Small Training Corpora

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Comparing a linguistic and a stochastic tagger

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A second-order Hidden Markov Model for part-of-speech tagging

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
HunPos: an open source trigram tagger

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Arabic Natural Language Processing: Challenges and Solutions

ACM Transactions on Asian Language Information Processing (TALIP)
Automatic part of speech tagging for Arabic: an experiment using Bigram hidden Markov model

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6% precision rates are obtained when unknown words occur in the test set.