Updating a name tagger using contemporary unlabeled data

  • Authors:
  • Cristina Mota;Ralph Grishman

  • Affiliations:
  • L2F (INESC-ID) & IST & NYU, Lisboa Portugal;New York University, New York NY

  • Venue:
  • ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maintain quality over time, and outperforms updating the labeled data. Furthermore, we will also show that augmenting the unlabeled data with older data in most cases does not result in better performance than simply using a smaller amount of current unlabeled data.