Self-training for biomedical parsing

  • Authors:
  • David McClosky;Eugene Charniak

  • Affiliations:
  • Brown University, Providence, RI;Brown University, Providence, RI

  • Venue:
  • HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parser self-training is the technique of taking an existing parser, parsing extra data and then creating a second parser by treating the extra data as further training data. Here we apply this technique to parser adaptation. In particular, we self-train the standard Charniak/Johnson Penn-Treebank parser using unlabeled biomedical abstracts. This achieves an f-score of 84.3% on a standard test set of biomedical abstracts from the Genia corpus. This is a 20% error reduction over the best previous result on biomedical data (80.2% on the same test set).