Using web-scale N-grams to improve base NP parsing performance

Authors:
Emily Pitler;Shane Bergsma;Dekang Lin;Kenneth Church
Affiliations:
University of Pennsylvania;University of Alberta;Google, Inc.;Johns Hopkins University
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 21
Cited 5

Word association norms, mutual information, and lexicography

Computational Linguistics
Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
Theory of Syntactic Recognition for Natural Languages

Theory of Syntactic Recognition for Natural Languages
A Trainable Bracketer for Noun Modifiers

AI '98 Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Lexical semantic techniques for corpus analysis

Computational Linguistics - Special issue on using large corpora: II
Coping with syntactic ambiguity or how to put the block in the box on the table

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Fast statistical parsing of noun phrases for document indexing

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Corpus statistics meet the noun compound: some empirical results

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Discriminative Reranking for Natural Language Parsing

Computational Linguistics
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Prepositional phrase attachment without oracles

Computational Linguistics
Unsupervised query segmentation using generative language models and wikipedia

Proceedings of the 17th international conference on World Wide Web
A unified and discriminative model for query refinement

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The linguistic structure of English web-search queries

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Search engine statistics beyond the n-gram: application to noun compound bracketing

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Web-scale features for full-scale parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Using large monolingual and bilingual corpora to improve coordination disambiguation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploiting web-derived selectional preference to improve statistical dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Attacking parsing bottlenecks with unlabeled data and relevant factorizations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Semi-supervised dependency parsing using lexical affinities

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We use web-scale N-grams in a base NP parser that correctly analyzes 95.4% of the base NPs in natural text. Web-scale data improves performance. That is, there is no data like more data. Performance scales log-linearly with the number of parameters in the model (the number of unique N-grams). The web-scale N-grams are particularly helpful in harder cases, such as NPs that contain conjunctions.