Data selection in semi-supervised learning for name tagging

  • Authors:
  • Heng Ji;Ralph Grishman

  • Affiliations:
  • New York University, New York, NY;New York University, New York, NY

  • Venue:
  • IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present two semi-supervised learning techniques to improve a state-of-the-art multi-lingual name tagger. For English and Chinese, the overall system obtains 1.7% - 2.1% improvement in F-measure, representing a 13.5% -- 17.4% relative reduction in the spurious, missing, and incorrect tags. We also conclude that simply relying upon large corpora is not in itself sufficient: we must pay attention to unlabeled data selection too. We describe effective measures to automatically select documents and sentences.