A machine learning-based system to normalise gene mentions to unique database identifiers

  • Authors:
  • Yifei Chen;Feng Liu;Bernard Manderick

  • Affiliations:
  • School of Information Sciences, Nanjing Audit University, Nanjing 211815, China/ Computational Modeling Lab, Department of Informatics, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Be ...;Computational Modeling Lab, Department of Informatics, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium.;Computational Modeling Lab, Department of Informatics, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an integrated Gene normaliser (GNer) to assign a unique database identifier for each recognised gene mention in biological literature. The GNer combines Support Vector Machines (SVMs) and some rule-base components. First, we construct a dictionary from EntrezGene and BioThesaurus. Then we reduce variations and ambiguities of synonyms based on a designed pre-processor. Finally, a SVM-based disambiguation filter is developed to eliminate the ambiguity of exact matching. From the experimental results, the proposed GNer can achieve a fairly good performance, which can achieve the precision 80.5%, the recall 86.4% and the Fβ>1 measure 83.4.