Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A test of the leaf-ancestor metric for parse accuracy
Natural Language Engineering
A statistical parser for Czech
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Probabilistic parsing for German using sister-head dependencies
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Intricacies of Collins' Parsing Model
Computational Linguistics
Head-Driven Statistical Models for Natural Language Parsing
Computational Linguistics
Lexicalization in crosslinguistic probabilistic parsing: the case of French
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Maximum entropy based restoration of Arabic diacritics
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Morphology and reranking for the statistical parsing of Spanish
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Construct state modification in the Arabic Treebank
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Integrated morphological and syntactic disambiguation for Modern Hebrew
COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Relational-realizational parsing
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Arabic preprocessing schemes for statistical machine translation
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Arabic diacritization through full morphological tagging
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Using a maximum entropy model to build segmentation lattices for MT
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Developing an Arabic treebank: methods, guidelines, procedures, and tools
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Evaluating and integrating treebank parsers on a biomedical corpus
Software '05 Proceedings of the Workshop on Software
CATiB: the Columbia Arabic Treebank
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Self-training PCFG grammars with latent annotations across languages
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Coarse-to-fine natural language processing
Coarse-to-fine natural language processing
Exploiting Separation of Closed-Class Categories for Arabic Tokenization and Part-of-Speech Tagging
ACM Transactions on Asian Language Information Processing (TALIP)
Improving Arabic dependency parsing with form-based and functional morphological features
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language-independent parsing with empty elements
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Using derivation trees for treebank error detection
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Joint Hebrew segmentation and parsing using a PCFG-LA lattice parser
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A class-based agreement model for generating accurately inflected translations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Joint evaluation of morphological segmentation and syntactic parsing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Joint Chinese word segmentation, POS tagging and parsing
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A machine-learning framework for hybrid machine translation
KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
Part of speech tagging for arabic
Natural Language Engineering
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system
Computational Linguistics
Dependency parsing of modern standard arabic with lexical and inflectional features
Computational Linguistics
Parsing models for identifying multiword expressions
Computational Linguistics
A recommendation system for Twitter users in the same neighborhood
Proceedings of the 16th Communications & Networking Symposium
Hi-index | 0.01 |
In this paper, we offer broad insight into the underperformance of Arabic constituency parsing by analyzing the interplay of linguistic phenomena, annotation choices, and model design. First, we identify sources of syntactic ambiguity understudied in the existing parsing literature. Second, we show that although the Penn Arabic Treebank is similar to other tree-banks in gross statistical terms, annotation consistency remains problematic. Third, we develop a human interpretable grammar that is competitive with a latent variable PCFG. Fourth, we show how to build better models for three different parsers. Finally, we show that in application settings, the absence of gold segmentation lowers parsing performance by 2--5% F1.