Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The structure of shared forests in ambiguous parsing
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Parsing algorithms and metrics
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Probabilistic CFG with latent annotations
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Bayesian query-focused summarization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Graphical Models, Exponential Families, and Variational Inference
Graphical Models, Exponential Families, and Variational Inference
Better informed training of latent syntactic features
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Dependency parsing by belief propagation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sparse multi-scale grammars for discriminative latent variable parsing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Discriminative word alignment via alignment matrix modeling
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Products of random latent variable grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joint parsing and alignment with weakly synchronized grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
On dual decomposition and linear programming relaxations for natural language processing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Dual decomposition for parsing with non-projective head automata
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Expectation propagation for approximate Bayesian inference
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Tractable inference for complex stochastic processes
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
A generalized mean field algorithm for variational inference in exponential families
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Fast inference in phrase extraction models with belief propagation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguistically-motivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the naïve approach. Combining latent, lexicalized, and unlexicalized annotations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.