Dependency parsing of modern standard arabic with lexical and inflectional features

Authors:
Yuval Marton;Nizar Habash;Owen Rambow
Affiliations:
Nuance Communications;Center for Computational Learning Systems, Columbia University;Center for Computational Learning Systems, Columbia University
Venue:
Computational Linguistics
Year:
2013

Citing 22
Cited 1

Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A statistical parser for Czech

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Maximum entropy based restoration of Arabic diacritics

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Morphology and reranking for the statistical parsing of Spanish

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Dependency parsing of turkish

Computational Linguistics
Algorithms for deterministic incremental dependency parsing

Computational Linguistics
Arabic Natural Language Processing

Arabic Natural Language Processing
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Parsing the SynTagRus treebank of Russian

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Three-dimensional parametrization for parsing morphologically rich languages

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Implementation of the Arabic numerals and their syntax in GF

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
CATiB: the Columbia Arabic Treebank

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
An efficient algorithm for easy-first non-directional dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving Arabic dependency parsing with lexical and inflectional morphological features

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Better Arabic parsing: baselines, evaluations, and analysis

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improving Arabic dependency parsing with form-based and functional morphological features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Identifying broken plurals, irregular gender, and rationality in Arabic text

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Getting more from morphology in multilingual dependency parsing

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Identifying broken plurals, irregular gender, and rationality in Arabic text

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the contribution of lexical and inflectional morphology features to dependency parsing of Arabic, a morphologically rich language with complex agreement patterns. Using controlled experiments, we contrast the contribution of different part-of-speech POS tag sets and morphological features in two input conditions: machine-predicted condition in which POS tags and morphological feature values are automatically assigned, and gold condition in which their true values are known. We find that more informative fine-grained tag sets are useful in the gold condition, but may be detrimental in the predicted condition, where they are outperformed by simpler but more accurately predicted tag sets. We identify a set of features definiteness, person, number, gender, and undiacritized lemma that improve parsing quality in the predicted condition, whereas other features are more useful in gold. We are the first to show that functional features for gender and number e.g., "broken plurals", and optionally the related rationality "humanness" feature, are more helpful for parsing than form-based gender and number. We finally show that parsing quality in the predicted condition can dramatically improve by training in a combined gold+predicted condition. We experimented with two transition-based parsers, MaltParser and Easy-First Parser. Our findings are robust across parsers, models, and input conditions. This suggests that the contribution of the linguistic knowledge in the tag sets and features we identified goes beyond particular experimental settings, and may be informative for other parsers and morphologically rich languages.