Combining multiple disambiguation methods for gene mention normalization

Authors:
Ning Xia;Hongfei Lin;Zhihao Yang;Yanpeng Li
Affiliations:
Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116023, China;Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116023, China;Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116023, China;Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116023, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 4
Cited 1

A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Gene symbol disambiguation using knowledge-based profiles

Bioinformatics
High-performance gene name normalization with GeNo

Bioinformatics
Combining multiple evidence for gene symbol disambiguation

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing

IdentityRank: Named entity disambiguation in the news domain

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

The rapid growth of biomedical literature prompts pervasive concentrations of biomedical text mining community to explore methodology for accessing and managing this ever-increasing knowledge. One important task of text mining in biomedical literature is gene mention normalization which recognizes the biomedical entities in biomedical texts and maps each gene mention discussed in the text to unique organic database identifiers. In this work, we employ an information retrieval based method which extracts gene mention's semantic profile from PubMed abstracts for gene mention disambiguation. This disambiguation method focuses on generating a more comprehensive representation of gene mention rather than the organic clues such as gene ontology which has fewer co-occurrences with the gene mention. Furthermore, we use an existing biomedical resource as another disambiguation method. Then we extract features from gene mention detection system's outcome to build a false positive filter according to Wikipedia's retrieved documents. Our system achieved F-measure of 83.1% on BioCreative II GN test data.