Protein name tagging for biomedical annotation in text

  • Authors:
  • Kaoru Yamamoto;Taku Kudo;Akihiko Konagaya;Yuji Matsumoto

  • Affiliations:
  • The Institute of Physical and Chemical Research, Suehiro-cho, Tsurumi-ku, Yokohama, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan;The Institute of Physical and Chemical Research, Suehiro-cho, Tsurumi-ku, Yokohama, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan

  • Venue:
  • BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore the use of morphological analysis as preprocessing for protein name tagging. Our method finds protein names by chunking based on a morpheme, the smallest unit determined by the morphological analysis. This helps to recognize the exact boundaries of protein names. Moreover, our morphological analyzer can deal with compounds. This offers a simple way to adapt name descriptions from biomedical resources for language processing. Using GENIA corpus 3.01, our method attains f-score of 70 points for protein molecule names, and 75 points for protein names including molecules, families and domains.