Procedure for quantitatively comparing the syntactic coverage of English grammars
HLT '91 Proceedings of the workshop on Speech and Natural Language
Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
Empirical methods for artificial intelligence
Empirical methods for artificial intelligence
The syntactic process
Theory of Syntactic Recognition for Natural Languages
Theory of Syntactic Recognition for Natural Languages
AICS '02 Proceedings of the 13th Irish International Conference on Artificial Intelligence and Cognitive Science
A Trainable Bracketer for Noun Modifiers
AI '98 Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations
Computational Linguistics
A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Acquiring disambiguation rules from text
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Statistical decision-tree models for parsing
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Noun phrase translation
The LinGO Redwoods treebank motivation and preliminary applications
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Recovering latent information in treebanks
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The Penn Treebank: annotating predicate argument structure
HLT '94 Proceedings of the workshop on Human Language Technology
On the parameter space of generative lexicalized statistical parsing models
On the parameter space of generative lexicalized statistical parsing models
Programming languages and their compilers: Preliminary notes
Programming languages and their compilers: Preliminary notes
Head-Driven Statistical Models for Natural Language Parsing
Computational Linguistics
Generalized multitext grammars
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Synchronous binarization for machine translation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Evaluating the accuracy of an unlexicalized statistical parser on the PARC DepBank
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Prepositional phrase attachment without oracles
Computational Linguistics
Determining the syntactic structure of medical terms in clinical notes
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
On the semantics of noun compounds
Computer Speech and Language
Search engine statistics beyond the n-gram: application to noun compound bracketing
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Statistical parsing with a context-free grammar and word statistics
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Unified dependency parsing of Chinese morphological and syntactic structures
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
How many multiword expressions do people know?
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Hi-index | 0.00 |
Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-standard data has restricted previous efforts to parse nps, making it impossible to perform the supervised experiments that have achieved high performance in so many Natural Language Processing (nlp) tasks. We comprehensively solve this problem by manually annotating np structure for the entire Wall Street Journal section of the Penn Treebank. The inter-annotator agreement scores that we attain dispel the belief that the task is too difficult, and demonstrate that consistent np annotation is possible. Our gold-standard np data is now available for use in all parsers. We experiment with this new data, applying the Collins (2003) parsing model, and find that its recovery of np structure is significantly worse than its overall performance. The parser's F-score is up to 5.69% lower than a baseline that uses deterministic rules. Through much experimentation, we determine that this result is primarily caused by a lack of lexical information. To solve this problem we construct a wide-coverage, large-scale np Bracketing system. With our Penn Treebank data set, which is orders of magnitude larger than those used previously, we build a supervised model that achieves excellent results. Our model performs at 93.8% F-score on the simple task that most previous work has undertaken, and extends to bracket longer, more complex nps that are rarely dealt with in the literature. We attain 89.14% F-score on this much more difficult task. Finally, we implement a post-processing module that brackets nps identified by the Bikel (2004) parser. Our np Bracketing model includes a wide variety of features that provide the lexical information that was missing during the parser experiments, and as a result, we outperform the parser's F-score by 9.04%. These experiments demonstrate the utility of the corpus, and show that many nlp applications can now make use of np structure.