Enhancing HMM-based biomedical named entity recognition by studying special phenomena

  • Authors:
  • Jie Zhang;Dan Shen;Guodong Zhou;Jian Su;Chew-Lim Tan

  • Affiliations:
  • Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore and Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Si ...;Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore and Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Si ...;Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore;Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore;Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore

  • Venue:
  • Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The purpose of this research is to enhance an HMM-based named entity recognizer in the biomedical domain. First, we analyze the characteristics of biomedical named entities. Then, we propose a rich set of features, including orthographic, morphological, part-of-speech, and semantic trigger features. All these features are integrated via a Hidden Markov Model with back-off modeling. Furthermore, we propose a method for biomedical abbreviation recognition and two methods for cascaded named entity recognition. Evaluation on the GENIA V3.02 and V1.1 shows that our system achieves 66.5 and 62.5 F-measure, respectively, and outperforms the previous best published system by 8.1 F-measure on the same experimental setting. The major contribution of this paper lies in its rich feature set specially designed for biomedical domain and the effective methods for abbreviation and cascaded named entity recognition. To our best knowledge, our system is the first one that copes with the cascaded phenomena.