Updating a name tagger using contemporary unlabeled data

Authors:
Cristina Mota;Ralph Grishman
Affiliations:
L2F (INESC-ID) & IST & NYU, Lisboa Portugal;New York University, New York NY
Venue:
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Year:
2009

Citing 4
Cited 1

Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Weakly supervised natural language learning without redundant views

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Language dynamics and capitalization using maximum entropy

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Processing natural language without natural language processing

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

A bootstrapping approach for training a NER with conditional random fields

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maintain quality over time, and outperforms updating the labeled data. Furthermore, we will also show that augmenting the unlabeled data with older data in most cases does not result in better performance than simply using a smaller amount of current unlabeled data.