Bootstrapping POS taggers using unlabelled data

Authors:
Stephen Clark;James R. Curran;Miles Osborne
Affiliations:
University of Edinburgh, Edinburgh;University of Edinburgh, Edinburgh;University of Edinburgh, Edinburgh
Venue:
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Year:
2003

Citing 11
Cited 25

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Tagging English text with a probabilistic model

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Investigating GIS and smoothing for maximum entropy taggers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Blueprint for a high performance NLP infrastructure

SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A backoff model for bootstrapping resources for non-English languages

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A robust multilingual portable phrase chunking system

Expert Systems with Applications: An International Journal
Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Proceedings of the 2009 ACM symposium on Applied Computing
Combining Language Modeling and Discriminative Classification for Word Segmentation

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Spoken language understanding using weakly supervised learning

Computer Speech and Language
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Semi-supervised learning for automatic prosodic event detection using co-training algorithm

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Faster parsing by supertagger adaptation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple semi-supervised training of part-of-speech taggers

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Adapting self-training for semantic role labeling

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Viterbi training improves unsupervised dependency parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A multi-domain web-based algorithm for POS tagging of unknown words

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
HITS-based seed selection and stop list construction for bootstrapping

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Self-training and co-training in biomedical word sense disambiguation

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Assessing the practical usability of an automatically annotated corpus

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Bilingual co-training for sentiment classification of chinese product reviews

Computational Linguistics
Automatic prosodic event detection using a novel labeling and selection method in co-training

Speech Communication
Automatically inducing a part-of-speech tagger by projecting from multiple source languages across aligned corpora

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Improved parsing and POS tagging using inter-sentence consistency constraints

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates booststrapping part-of-speech taggers using co-training, in which two taggers are iteratively re-trained on each other's output. Since the output of the taggers is noisy, there is a question of which newly labelled examples to add to the training set. We investigate selecting examples by directly maximising tagger agreement on unlabelled data, a method which has been theoretically and empirically motivated in the co-training literature. Our results show that agreement-based co-training can significantly improve tagging performance for small seed datasets. Further results show that this form of co-training considerably outperforms self-training. However, we find that simply re-training on all the newly labelled data can, in some cases, yield comparable results to agreement-based co-training, with only a fraction of the computational cost.