Learning structural dependencies of words in the Zipfian tail

Authors:
Tejaswini Deoskar;Markos Mylonakis;Khalil Sima'an
Affiliations:
University of Edinburgh;ILLC University of Amsterdam;ILLC University of Amsterdam
Venue:
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Year:
2011

Citing 31
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The syntactic process

The syntactic process
Statistical Language Learning

Statistical Language Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
PCFG models of linguistic tree representations

Computational Linguistics
Supertagging: an approach to almost parsing

Computational Linguistics
Assigning function tags to parsed text

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Generative models for statistical parsing with Combinatory Categorial Grammar

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Trace prediction and recovery with unlexicalized PCFGs and slash features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Efficient parsing of highly ambiguous context-free grammars with bit vectors

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Accurate function parsing

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Simple, robust, scalable semi-supervised learning via expectation regularization

Proceedings of the 24th international conference on Machine learning
Re-estimation of lexical parameters for treebank PCFGs

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Semi-supervised learning of dependency parsers using generalized expectation criteria

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

The Journal of Machine Learning Research
Induction of fine-grained lexical parameters of treebank pcfgs with inside-outside estimation and lexical transformations

Induction of fine-grained lexical parameters of treebank pcfgs with inside-outside estimation and lexical transformations
Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Unsupervised parse selection for HPSG

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Web-scale features for full-scale parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using semi-supervised EM, we learn finegrained but sparse lexical parameters of a generative parsing model (a PCFG) initially estimated over the Penn Treebank. Our lexical parameters employ supertags, which encode complex structural information at the pre-terminal level, and are particularly sparse in labeled data -- our goal is to learn these for words that are unseen or rare in the labeled data. In order to guide estimation from unlabeled data, we incorporate both structural and lexical priors from the labeled data. We get a large error reduction in parsing ambiguous structures associated with unseen verbs, the most important case of learning lexico-structural dependencies. We also obtain a statistically significant improvement in labeled bracketing score of the treebank PCFG, the first successful improvement via semi-supervised EM of a generative structured model already trained over large labeled data.