A joint model for normalizing gene and organism mentions in text

  • Authors:
  • Georgi Georgiev;Preslav Nakov;Kuzman Ganchev;Deyan Peychev;Vassil Momchev

  • Affiliations:
  • Ontotext AD, Sofia, Bulgaria;National University of Singapore, Singapore;University of Pennsylvania, Philadelphia, PA;Ontotext AD, Sofia, Bulgaria;Ontotext AD, Sofia, Bulgaria

  • Venue:
  • WBIE '09 Proceedings of the Workshop on Biomedical Information Extraction
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of gene mention normalization is to propose an appropriate canonical name, or an identifier from a popular database, for a gene or a gene product mentioned in a given piece of text. The task has attracted a lot of research attention for several organisms under the assumption that both the mention boundaries and the target organism are known. Here we extend the task to also recognizing whether the gene mention is valid and to finding the organism it is from. We solve this extended task using a joint model for gene and organism name normalization which allows for instances from different organisms to share features, thus achieving sizable performance gains with different learning methods: Naïve Bayes, Maximum Entropy, Perceptron and MIRA, as well as averaged versions of the last two. The evaluation results for our joint classifier show F1 score of over 97%, which proves the potential of the approach.