SVM-Based biological named entity recognition using minimum edit-distance feature boosted by virtual examples

  • Authors:
  • Eunji Yi;Gary Geunbae Lee;Yu Song;Soo-Jun Park

  • Affiliations:
  • Department of CSE, POSTECH, Pohang, Korea;Department of CSE, POSTECH, Pohang, Korea;Department of CSE, POSTECH, Pohang, Korea;Bioinformatics Research Team, Computer and Software Research Lab. ETRI, Taejon, Korea

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose two independent solutions to the problems of spelling variants and the lack of annotated corpus, which are the main difficulties in SVM(Support-Vector Machine) and other machine-learning based biological named entity recognition. To resolve the problem of spelling variants, we propose the use of edit-distance as a feature for SVM. To resolve the lack-of-corpus problem, we propose the use of virtual examples, by which the annotated corpus can be automatically expanded in a fast, efficient and easy way. The experimental results show that the introduction of edit-distance produces some improvements. And the model, which is trained with the corpus expanded by virtual examples, outperforms the model trained with the original corpus. Finally, we achieved the high performance of 71.46 % in F-measure (64.03 % in precision, 80.84 % in recall) in the experiment of five categories named entity recognition on GENIA corpus (version 3.0).