Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations
Computational Linguistics
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical decision-tree models for parsing
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Three new probabilistic models for dependency parsing: an exploration
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Recovering latent information in treebanks
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Distributional phrase structure induction
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Probabilistic CFG with latent annotations
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Annealing structural bias in multilingual weighted grammar induction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Compiling Comp Ling: practical weighted dynamic programming and the Dyna language
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Head-driven PCFGs with latent-head statistics
Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Statistical parsing with a context-free grammar and word statistics
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
The Information bottleneck EM algorithm
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Journal of the American Society for Information Science and Technology
Toward Tree Substitution Grammars with latent annotations
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Training factored PCFGs with expectation propagation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
We study unsupervised methods for learning refinements of the nonterminals in a treebank. Following Matsuzaki et al. (2005) and Prescher (2005), we may for example split NP without supervision into NP[0] and NP[1], which behave differently. We first propose to learn a PCFG that adds such features to nonterminals in such a way that they respect patterns of linguistic feature passing: each node's nonterminal features are either identical to, or independent of, those of its parent. This linguistic constraint reduces runtime and the number of parameters to be learned. However, it did not yield improvements when training on the Penn Treebank. An orthogonal strategy was more successful: to improve the performance of the EM learner by treebank preprocessing and by annealing methods that split nonterminals selectively. Using these methods, we can maintain high parsing accuracy while dramatically reducing the model size.