Annotating multiple types of biomedical entities: a single word classification approach

  • Authors:
  • Chih Lee;Wen-Juan Hou;Hsin-Hsi Chen

  • Affiliations:
  • National Taiwan University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan

  • Venue:
  • JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named entity recognition is a fundamental task in biomedical data mining. Multiple - class annotation is more challenging than single - class annotation. In this paper, we took a single word classification approach to dealing with the multiple - class annotation problem using Support Vector Machines (SVMs). Word attributes, results of existing gene/protein name taggers, context, and other information are important features for classification. During training, the size of training data and the distribution of named entities are considered. The preliminary results showed that the approach might be feasible when more training data is used to alleviate the data imbalance problem.