Semi-supervised training of a statistical parser from unlabeled partially-bracketed data

Authors:
Rebecca Watson;Ted Briscoe;John Carroll
Affiliations:
University of Cambridge, UK;University of Cambridge, UK;University of Sussex, UK
Venue:
IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Year:
2007

Citing 15
Cited 0

An efficient augmented-context-free parsing algorithm

Computational Linguistics
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Tagging English text with a probabilistic model

Computational Linguistics
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Supervised grammar induction using training data with limited constituent information

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
The LinGO Redwoods treebank motivation and preliminary applications

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Parsing the WSJ using CCG and log-linear models

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The second release of the RASP system

COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Maximum entropy estimation for feature forests

HLT '02 Proceedings of the second international conference on Human Language Technology Research
MAP adaptation of stochastic grammars

Computer Speech and Language
Efficient extraction of grammatical relations

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Adapting a probabilistic disambiguation model of an HPSG parser to a new domain

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare the accuracy of a statistical parse ranking model trained from a fully-annotated portion of the Susanne treebank with one trained from unlabeled partially-bracketed sentences derived from this treebank and from the Penn Treebank. We demonstrate that confidence-based semi-supervised techniques similar to self-training outperform expectation maximization when both are constrained by partial bracketing. Both methods based on partially-bracketed training data outperform the fully supervised technique, and both can, in principle, be applied to any statistical parser whose output is consistent with such partial-bracketing. We also explore tuning the model to a different domain and the effect of in-domain data in the semi-supervised training processes.