Bootstrapping statistical parsers from small datasets

Authors:
Mark Steedman;Miles Osborne;Anoop Sarkar;Stephen Clark;Rebecca Hwa;Julia Hockenmaier;Paul Ruhlen;Steven Baker;Jeremiah Crim
Affiliations:
University of Edinburgh;University of Edinburgh;Simon Fraser University;University of Edinburgh;University of Maryland;University of Edinburgh;Johns Hopkins University;Cornell University;Johns Hopkins University
Venue:
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Year:
2003

Citing 10
Cited 47

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Weakly supervised natural language learning without redundant views

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Example selection for bootstrapping statistical parsers

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Bootstrapping parsers via syntactic projection across parallel texts

Natural Language Engineering
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Sample Selection for Statistical Parsing

Computational Linguistics
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Bootstrapping coreference classifiers with multiple machine learning algorithms

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Enhancing relevance feedback in image retrieval using unlabeled data

ACM Transactions on Information Systems (TOIS)
Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Annealing structural bias in multilingual weighted grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Partial training for a lexicalized-grammar parser

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Semisupervised Regression with Cotraining-Style Algorithms

IEEE Transactions on Knowledge and Data Engineering
The bootstrapping of the Yarowsky algorithm in real corpora

Information Processing and Management: an International Journal
Innovations in Natural Language Document Processing for Requirements Engineering

Innovations for Requirement Analysis. From Stakeholders' Needs to Formal Designs
Self-training for biomedical parsing

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Supervised Selective Combining Pattern Recognition Modalities and Its Application to Signature Verification by Fusing On-Line and Off-Line Kernels

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
A look at parsing and its applications

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Natural language generation for text-to-text applications using an information-slim representation

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
A comparison of structural correspondence learning and self-training for discriminative parse selection

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
Semi-supervised regression with co-training

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
MAP adaptation of stochastic grammars

Computer Speech and Language
Cross language dependency parsing using a bilingual lexicon

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Improving dependency parsing with subtrees from auto-parsed data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Self-training PCFG grammars with latent annotations across languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Parser-based retraining for domain adaptation of probabilistic generators

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Semi-supervised self-training for sentence subjectivity classification

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Faster parsing by supertagger adaptation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Self-training without reranking for parser domain adaptation and its impact on semantic role labeling

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Effective constituent projection across languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Software defect detection with rocus

Journal of Computer Science and Technology
A survey of grammatical inference methods for natural language learning

Artificial Intelligence Review
Partial parsing from bitext projections

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parsing natural language queries for life science knowledge

BioNLP '11 Proceedings of BioNLP 2011 Workshop
The unsymmetrical-style co-training

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Chinese chunking with tri-training learning

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Sample-based software defect prediction with active and semi-supervised learning

Automated Software Engineering
Relaxed cross-lingual projection of constituent syntax

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning structural dependencies of words in the Zipfian tail

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Data point selection for self-training

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
EXPLOITING SUBTREES IN AUTO-PARSED DATA TO IMPROVE DEPENDENCY PARSING

Computational Intelligence
Semi-supervised dependency parsing using lexical affinities

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Improved parsing and POS tagging using inter-sentence consistency constraints

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of boot-strapping parsers when the manually parsed training material is in a different domain to either the raw sentences or the testing material. We show that boot-strapping continues to be useful, even though no manually produced parses from the target domain are used.