Subdomain adaptation of a POS tagger with a small corpus

  • Authors:
  • Yuka Tateisi;Yoshimasa Tsuruoka;Jun-ichi Tsujii

  • Affiliations:
  • Kogakuin University, Shinjuku-ku, Tokyo, Japan;University of Manchester, Manchester, U.K.;University of Tokyo, Bunkyo-ku, Tokyo, Japan and University of Manchester, Manchester, U.K.

  • Venue:
  • LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the domain of biomedical research abstracts, two large corpora, namely GENIA (Kim et al 2003) and Penn BioIE (Kulik et al 2004) are available. Both are basically in human domain and the performance of systems trained on these corpora when they are applied to abstracts dealing with other species is unknown. In machine-learning-based systems, re-training the model with addition of corpora in the target domain has achieved promising results (e.g. Tsuruoka et al 2005, Lease et al 2005). In this paper, we compare two methods for adaptation of POS taggers trained for GENIA and Penn BioIE corpora to Drosophila melanogaster (fruit fly) domain.