Machine Learning
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The best of two worlds: cooperation of statistical and rule-based taggers for Czech
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Polynomial to linear: efficient classification with conjunctive features
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Simple semi-supervised training of part-of-speech taggers
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Influence of pre-annotation on POS-tagged corpus development
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
RDRCE: combining machine learning and knowledge acquisition
PKAW'10 Proceedings of the 11th international conference on Knowledge management and acquisition for smart systems and services
Part-of-speech tagging from 97% to 100%: is it time for some linguistics?
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Semisupervised condensed nearest neighbor for part-of-speech tagging
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Evaluation without references: IBM1 scores as evaluation metrics
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morphemes and POS tags for n-gram based evaluation metrics
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Using wiktionary to improve lexical disambiguation in multiple languages
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A cost sensitive part-of-speech tagging: differentiating serious errors from minor errors
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Fast and robust part-of-speech tagging using dynamic model selection
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging
Language Resources and Evaluation
Rule-Based morphological tagger for an inflectional language
COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Morphological and syntactic case in statistical dependency parsing
Computational Linguistics
Hi-index | 0.00 |
This paper describes POS tagging experiments with semi-supervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standard-sized supervised (manually annotated) dataset (106 tokens) combined with a relatively modest (in the order of 108 tokens) unsupervised (plain) data in a bagging-like fashion showed significant improvement of the POS classification task on typologically different languages, yielding better than state-of-the-art results for English and Czech (4.12 % and 4.86 % relative error reduction, respectively; absolute accuracies being 97.44 % and 95.89 %).