Parsing noun phrases in the penn treebank

Authors:
David Vadas;James R. Curran
Affiliations:
University of Sydney;University of Sydney
Venue:
Computational Linguistics
Year:
2011

Citing 35
Cited 2

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
The syntactic process

The syntactic process
Theory of Syntactic Recognition for Natural Languages

Theory of Syntactic Recognition for Natural Languages
Using Latent Semantic Indexing as a Measure of Conceptual Association for Noun Compound Disambiguation

AICS '02 Proceedings of the 13th Irish International Conference on Artificial Intelligence and Cognitive Science
A Trainable Bracketer for Noun Modifiers

AI '98 Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Acquiring disambiguation rules from text

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Noun phrase translation

Noun phrase translation
The LinGO Redwoods treebank motivation and preliminary applications

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Recovering latent information in treebanks

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Parsing with treebank grammars: empirical bounds, theoretical models, and the structure of the Penn Treebank

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology
On the parameter space of generative lexicalized statistical parsing models

On the parameter space of generative lexicalized statistical parsing models
Programming languages and their compilers: Preliminary notes

Programming languages and their compilers: Preliminary notes
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
Generalized multitext grammars

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Synchronous binarization for machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Evaluating the accuracy of an unlexicalized statistical parser on the PARC DepBank

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Prepositional phrase attachment without oracles

Computational Linguistics
Determining the syntactic structure of medical terms in clinical notes

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
On the semantics of noun compounds

Computer Speech and Language
Search engine statistics beyond the n-gram: application to noun compound bracketing

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Unified dependency parsing of Chinese morphological and syntactic structures

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
How many multiword expressions do people know?

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-standard data has restricted previous efforts to parse nps, making it impossible to perform the supervised experiments that have achieved high performance in so many Natural Language Processing (nlp) tasks. We comprehensively solve this problem by manually annotating np structure for the entire Wall Street Journal section of the Penn Treebank. The inter-annotator agreement scores that we attain dispel the belief that the task is too difficult, and demonstrate that consistent np annotation is possible. Our gold-standard np data is now available for use in all parsers. We experiment with this new data, applying the Collins (2003) parsing model, and find that its recovery of np structure is significantly worse than its overall performance. The parser's F-score is up to 5.69% lower than a baseline that uses deterministic rules. Through much experimentation, we determine that this result is primarily caused by a lack of lexical information. To solve this problem we construct a wide-coverage, large-scale np Bracketing system. With our Penn Treebank data set, which is orders of magnitude larger than those used previously, we build a supervised model that achieves excellent results. Our model performs at 93.8% F-score on the simple task that most previous work has undertaken, and extends to bracket longer, more complex nps that are rarely dealt with in the literature. We attain 89.14% F-score on this much more difficult task. Finally, we implement a post-processing module that brackets nps identified by the Bikel (2004) parser. Our np Bracketing model includes a wide variety of features that provide the lexical information that was missing during the parser experiments, and as a result, we outperform the parser's F-score by 9.04%. These experiments demonstrate the utility of the corpus, and show that many nlp applications can now make use of np structure.