Effective self-training for parsing

Authors:
David McClosky;Eugene Charniak;Mark Johnson
Affiliations:
Brown University, Providence, RI;Brown University, Providence, RI;Brown University, Providence, RI
Venue:
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Year:
2006

Citing 15
Cited 113

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Discriminative training of a neural network statistical parser

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MAP adaptation of stochastic grammars

Computer Speech and Language
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Natural language processing for information retrieval: the time is ripe (again)

Proceedings of the ACM first Ph.D. workshop in CIKM
Exploring hedge identification in biomedical literature

Journal of Biomedical Informatics
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Towards robust semantic role labeling

Computational Linguistics
Semi-supervised model adaptation for statistical machine translation

Machine Translation
Innovations in Natural Language Document Processing for Requirements Engineering

Innovations for Requirement Analysis. From Stakeholders' Needs to Formal Designs
Link based small sample learning for web spam detection

Proceedings of the 18th international conference on World wide web
Combining Language Modeling and Discriminative Classification for Word Segmentation

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Self-training for biomedical parsing

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
An improved markov random field model for supporting verbose queries

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Reranking for biomedical named-entity recognition

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
A look at parsing and its applications

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Re-estimation of lexical parameters for treebank PCFGs

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
When is self-training effective for parsing?

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Exploring an auxiliary distribution based approach to domain adaptation of a syntactic disambiguation model

CrossParser '08 Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Rich bitext projection features for parse reranking

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Fast full parsing by linear-chain conditional random fields

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Structural correspondence learning for parse disambiguation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Learning phrasal categories

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A self-training approach to cost sensitive uncertainty sampling

Machine Learning
Cross-task knowledge-constrained self training

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Language modeling for determiner selection

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
The effect of corpus size on case frame acquisition for discourse analysis

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using self-trained bilexical preferences to improve disambiguation accuracy

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Semi-supervised training of a statistical parser from unlabeled partially-bracketed data

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Coupling semi-supervised learning of categories and relations

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
A comparison of structural correspondence learning and self-training for discriminative parse selection

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
An Intelligent Agent That Autonomously Learns How to Translate

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Combining labeled and unlabeled data with word-class distribution learning

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting heterogeneous treebanks for parsing

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Semi-supervised learning of dependency parsers using generalized expectation criteria

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
k-best A* parsing

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Cross parser evaluation and tagset variation: a French treebank study

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Clustering words by syntactic similarity improves dependency parsing of predicate-argument structures

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Interactive predictive parsing

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Unbounded dependency recovery for parser evaluation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Self-training PCFG grammars with latent annotations across languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Polynomial to linear: efficient classification with conjunctive features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
K-best combination of syntactic parsers

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
On statistical parsing of French with supervised and semi-supervised strategies

CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Parser-based retraining for domain adaptation of probabilistic generators

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Multi-view semi-supervised learning for dialog act segmentation of speech

IEEE Transactions on Audio, Speech, and Language Processing
OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

International Journal of Computer Vision
Products of random latent variable grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic domain adaptation for parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Reranking the Berkeley and brown parsers

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Inducing domain-specific semantic class taggers from (almost) nothing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Faster parsing by supertagger adaptation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Profiting from mark-up: hyper-text annotations for guided parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Viterbi training for PCFGs: hardness results and competitiveness of uniform initialization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
The same-head heuristic for coreference

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Adapting self-training for semantic role labeling

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Using SVMs with the command relation features to identify negated events in biomedical literature

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Grammar-driven versus data-driven: which parsing system is more affected by domain shifts?

NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
Domain adaptation to summarize human conversations

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Exploring representation-learning approaches to domain adaptation

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Self-training without reranking for parser domain adaptation and its impact on semantic role labeling

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Viterbi training improves unsupervised dependency parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improved natural language learning via variance-regularization support vector machines

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Self-training with products of latent variable grammars

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Utilizing extra-sentential context for parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised parse selection for HPSG

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Semi-supervised dependency parsing using generalized tri-training

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A deterministic method to predict phrase boundaries of a syntactic tree

ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
Confidence measures for error discrimination in an interactive predictive parsing framework

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multi-domain web-based algorithm for POS tagging of unknown words

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Phrase structure parsing with dependency structure

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Chart pruning for fast lexicalised-grammar parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A survey of grammatical inference methods for natural language learning

Artificial Intelligence Review
Deciphering foreign language

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Disentangling chat with local coherence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Confidence driven unsupervised semantic parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Effective measures of domain similarity for parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
HITS-based seed selection and stop list construction for bootstrapping

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Natural Language Processing (Almost) from Scratch

The Journal of Machine Learning Research
Learning to fuse disparate sentences

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Training a parser for machine translation reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semi-supervised CCG lexicon extension

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Learning structural dependencies of words in the Zipfian tail

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Large-scale corpus-driven PCFG approximation of an HPSG

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Comparing the use of edited and unedited text in parser self-training

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Data point selection for self-training

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Citation-based bootstrapping for large-scale author disambiguation

Journal of the American Society for Information Science and Technology
Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Character-based kernels for novelistic plot structure

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
The challenges of parsing Chinese with combinatory categorial grammar

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Graph-based lexicon expansion with sparsity-inducing penalties

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Domain adaptation of a dependency parser with a class-class selectional preference model

ACL '12 Proceedings of ACL 2012 Student Research Workshop
Nudging the envelope of direct transfer methods for multilingual named entity recognition

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Semi-supervised dependency parsing using lexical affinities

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Parser showdown at the wall street corral: an empirical investigation of error types in parser output

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Relabeling distantly supervised training data for temporal knowledge base population

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Semi-supervised multitask learning via self-training and maximum entropy discrimination

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Detecting concept relations in clinical text: Insights from a state-of-the-art model

Journal of Biomedical Informatics
Knowledge sources for constituent parsing of german, a morphologically rich and less-configurational language

Computational Linguistics
Dependency parsing of modern standard arabic with lexical and inflectional features

Computational Linguistics
DTW-D: time series semi-supervised learning from a single example

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A self-trained semisupervised SVM approach to the remote sensing land cover classification

Computers & Geosciences
Freedom through constraints: User-oriented architectural design

Advanced Engineering Informatics
A feature-based approach to better automatic treebank conversion

Language Resources and Evaluation
An intelligent Web agent that autonomously learns how to translate

Web Intelligence and Agent Systems
Coupling as Strategy for Reducing Concept-Drift in Never-ending Learning Environments

Fundamenta Informaticae - Cognitive Informatics and Computational Intelligence: Theory and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.