Tuning support vector machines for biomedical named entity recognition

  • Authors:
  • Jun'ichi Kazama;Takaki Makino;Yoshihiro Ohta;Jun'ichi Tsujii

  • Affiliations:
  • University of Tokyo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Bunkyo-ku, Tokyo, Japan;Hitachi, Ltd., Kokubunji, Tokyo, Japan;University of Tokyo, Bunkyo-ku, Tokyo, Japan

  • Venue:
  • BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
  • Year:
  • 2002

Quantified Score

Hi-index 0.02

Visualization

Abstract

We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus - the GENIA corpus - tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Experiments on the GENIA corpus show that our class splitting technique not only enables the training with the GENIA corpus but also improves the accuracy. The proposed new features also contribute to improve the accuracy. We compare our SVM-based recognition system with a system using Maximum Entropy tagging method.