Exploiting unannotated corpora for tagging and chunking

  • Authors:
  • Rie Kubota Ando

  • Affiliations:
  • IBM T.J. Watson Research Center, Hawthorne, NY

  • Venue:
  • ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method that exploits unannotated corpora for compensating the paucity of annotated training data on the chunking and tagging tasks. It collects and compresses feature frequencies from a large unannotated corpus for use by linear classifiers. Experiments on two tasks show that it consistently produces significant performance improvements.