Two-phase biomedical NE recognition based on SVMs

  • Authors:
  • Ki-Joong Lee;Young-Sook Hwang;Hae-Chang Rim

  • Affiliations:
  • Korea University, Anam-dong, SEOUL, Korea;Korea University, Anam-dong, SEOUL, Korea;Korea University, Anam-dong, SEOUL, Korea

  • Venue:
  • BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using SVMs for named entity recognition, we are often confronted with the multi-class problem. Larger as the number of classes is, more severe the multi-class problem is. Especially, one-vs-rest method is apt to drop the performance by generating severe unbalanced class distribution. In this study, to tackle the problem, we take a two-phase named entity recognition method based on SVMs and dictionary; at the first phase, we try to identify each entity by a SVM classifier and post-process the identified entities by a simple dictionary look-up; at the second phase, we try to classify the semantic class of the identified entity by SVMs. By dividing the task into two subtasks, i.e. the entity identification and the semantic classification, the unbalanced class distribution problem can be alleviated. Furthermore, we can select the features relevant to each task and take an alternative classification method according to the task. The experimental results on the GENIA corpus show that the proposed method is effective not only in the reduction of training cost but also in performance improvement: the identification performance is about 79.9(Fβ = 1), the semantic classification accuracy is about 66.5(Fβ = 1).