A priority model for named entities

  • Authors:
  • Lorraine Tanabe;W. John Wilbur

  • Affiliations:
  • National Center for Biotechnology Information, Bethesda, MD;National Center for Biotechnology Information, Bethesda, MD

  • Venue:
  • BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new approach to named entity classification which we term a Priority Model. We also describe the construction of a semantic database called SemCat consisting of a large number of semantically categorized names relevant to biomedicine. We used SemCat as training data to investigate name classification techniques. We generated a statistical language model and probabilistic context-free grammars for gene and protein name classification, and compared the results with the new model. For all three methods, we used a variable order Markov model to predict the nature of strings not represented in the training data. The Priority Model achieves an F-measure of 0.958--0.960, consistently higher than the statistical language model and probabilistic context-free grammar.