Attacking parsing bottlenecks with unlabeled data and relevant factorizations

Authors:
Emily Pitler
Affiliations:
University of Pennsylvania, Philadelphia, PA
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 17
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
Class-based n-gram models of natural language

Computational Linguistics
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Dealing with conjunctions in a machine translation environment

EACL '83 Proceedings of the first conference on European chapter of the Association for Computational Linguistics
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

The Journal of Machine Learning Research
A tale of two parsers: investigating and combining graph-based and transition-based dependency parsing using beam-search

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient third-order dependency parsers

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Dynamic programming for linear-time incremental parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Dual decomposition for parsing with non-projective head automata

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Using web-scale N-grams to improve base NP parsing performance

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Web-scale features for full-scale parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Using large monolingual and bilingual corpora to improve coordination disambiguation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploiting web-derived selectional preference to improve statistical dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for down-stream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependencies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and prepositions with an accuracy of 87.4%.