Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Tagging English text with a probabilistic model
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Does Baum-Welch re-estimation help taggers?
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Investigating GIS and smoothing for maximum entropy taggers
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Bootstrapping statistical parsers from small datasets
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Blueprint for a high performance NLP infrastructure
SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Contrastive estimation: training log-linear models on unlabeled data
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A backoff model for bootstrapping resources for non-English languages
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A robust multilingual portable phrase chunking system
Expert Systems with Applications: An International Journal
Semi-supervised co-training and active learning based approach for multi-view intrusion detection
Proceedings of the 2009 ACM symposium on Applied Computing
Combining Language Modeling and Discriminative Classification for Word Segmentation
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Spoken language understanding using weakly supervised learning
Computer Speech and Language
Co-training for cross-lingual sentiment classification
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Semi-supervised learning for automatic prosodic event detection using co-training algorithm
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Faster parsing by supertagger adaptation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple semi-supervised training of part-of-speech taggers
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Adapting self-training for semantic role labeling
ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Viterbi training improves unsupervised dependency parsing
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Uptraining for accurate deterministic question parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A multi-domain web-based algorithm for POS tagging of unknown words
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
HITS-based seed selection and stop list construction for bootstrapping
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Self-training and co-training in biomedical word sense disambiguation
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Assessing the practical usability of an automatically annotated corpus
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Bilingual co-training for sentiment classification of chinese product reviews
Computational Linguistics
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Recall-oriented learning of named entities in Arabic Wikipedia
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Improved parsing and POS tagging using inter-sentence consistency constraints
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
This paper investigates booststrapping part-of-speech taggers using co-training, in which two taggers are iteratively re-trained on each other's output. Since the output of the taggers is noisy, there is a question of which newly labelled examples to add to the training set. We investigate selecting examples by directly maximising tagger agreement on unlabelled data, a method which has been theoretically and empirically motivated in the co-training literature. Our results show that agreement-based co-training can significantly improve tagging performance for small seed datasets. Further results show that this form of co-training considerably outperforms self-training. However, we find that simply re-training on all the newly labelled data can, in some cases, yield comparable results to agreement-based co-training, with only a fraction of the computational cost.