Identifying broken plurals, irregular gender, and rationality in Arabic text

Authors:
Sarah Alkuhlani;Nizar Habash
Affiliations:
Center for Computational Learning Systems Columbia University;Center for Computational Learning Systems Columbia University
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 14
Cited 2

Arabic morphological analysis techniques: a comprehensive survey

Journal of the American Society for Information Science and Technology
Fast methods for kernel-based text analysis

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Arabic Natural Language Processing

Arabic Natural Language Processing
Arabic Computational Morphology: Knowledge-based and Empirical Methods

Arabic Computational Morphology: Knowledge-based and Empirical Methods
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
The architecture of a standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
ElixirFM: implementation of functional Arabic morphology

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
CATiB: the Columbia Arabic Treebank

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
An efficient algorithm for easy-first non-directional dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving Arabic dependency parsing with lexical and inflectional morphological features

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Improving Arabic dependency parsing with form-based and functional morphological features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Dependency parsing of modern standard arabic with lexical and inflectional features

Computational Linguistics

Rich morphology generation using statistical machine translation

INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
Dependency parsing of modern standard arabic with lexical and inflectional features

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morpho-syntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic. We compare two techniques, using simple maximum likelihood (MLE) with back-off and a support vector machine based sequence tagger (Yamcha). We study a number of orthographic, morphological and syntactic learning features. Our results show that the MLE technique is preferred for words seen in the training data, while the Yam-cha technique is optimal for unseen words, which are our real target. Furthermore, we show that for unseen words, morphological features help beyond orthographic features and that syntactic features help even more. A combination of the two techniques improves overall performance even further.