When is self-training effective for parsing?

Authors:
David McClosky;Eugene Charniak;Mark Johnson
Affiliations:
Brown University, Providence, RI;Brown University, Providence, RI;Brown University, Providence, RI
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 9
Cited 12

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Discriminative Reranking for Natural Language Parsing

Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Discriminative models for semi-supervised natural language learning

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Cross-domain dependency parsing using a deep linguistic grammar

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Self-training PCFG grammars with latent annotations across languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
An alternative to head-driven approaches for parsing a (relatively) free word-order language

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Correlating natural language parser performance with statistical measures of the text

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Self-training without reranking for parser domain adaptation and its impact on semantic role labeling

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
CADRA: context aware data retrieval architecture

International Journal of Advanced Intelligence Paradigms
ULISSE: an unsupervised algorithm for detecting reliable dependency parses

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Semi-supervised CCG lexicon extension

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Data point selection for self-training

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

Self-training has been shown capable of improving on state-of-the-art parser performance (McClosky et al., 2006) despite the conventional wisdom on the matter and several studies to the contrary (Charniak, 1997; Steedman et al., 2003). However, it has remained unclear when and why self-training is helpful. In this paper, we test four hypotheses (namely, presence of a phase transition, impact of search errors, value of non-generative reranker features, and effects of unknown words). From these experiments, we gain a better understanding of why self-training works for parsing. Since improvements from self-training are correlated with unknown bigrams and biheads but not unknown words, the benefit of self-training appears most influenced by seeing known words in new combinations.