Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Weakly supervised natural language learning without redundant views
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Language dynamics and capitalization using maximum entropy
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Processing natural language without natural language processing
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
A bootstrapping approach for training a NER with conditional random fields
EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Hi-index | 0.01 |
For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maintain quality over time, and outperforms updating the labeled data. Furthermore, we will also show that augmenting the unlabeled data with older data in most cases does not result in better performance than simply using a smaller amount of current unlabeled data.