Semi-supervised training for the averaged perceptron POS tagger

Authors:
Drahomíra "johanka" Spoustová;Jan Hajič;Jan Raab;Miroslav Spousta
Affiliations:
Charles University Prague, Czech Republic;Charles University Prague, Czech Republic;Charles University Prague, Czech Republic;Charles University Prague, Czech Republic
Venue:
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2009

Citing 9
Cited 16

Bagging predictors

Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The best of two worlds: cooperation of statistical and rule-based taggers for Czech

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies

The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Polynomial to linear: efficient classification with conjunctive features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Simple semi-supervised training of part-of-speech taggers

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Influence of pre-annotation on POS-tagged corpus development

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
RDRCE: combining machine learning and knowledge acquisition

PKAW'10 Proceedings of the 11th international conference on Knowledge management and acquisition for smart systems and services
Part-of-speech tagging from 97% to 100%: is it time for some linguistics?

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Semisupervised condensed nearest neighbor for part-of-speech tagging

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Evaluation without references: IBM1 scores as evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morphemes and POS tags for n-gram based evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Using wiktionary to improve lexical disambiguation in multiple languages

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A cost sensitive part-of-speech tagging: differentiating serious errors from minor errors

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Fast and robust part-of-speech tagging using dynamic model selection

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging

Language Resources and Evaluation
Rule-Based morphological tagger for an inflectional language

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Morphological and syntactic case in statistical dependency parsing

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes POS tagging experiments with semi-supervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standard-sized supervised (manually annotated) dataset (106 tokens) combined with a relatively modest (in the order of 108 tokens) unsupervised (plain) data in a bagging-like fashion showed significant improvement of the POS classification task on typologically different languages, yielding better than state-of-the-art results for English and Czech (4.12 % and 4.86 % relative error reduction, respectively; absolute accuracies being 97.44 % and 95.89 %).