Gene symbol disambiguation using knowledge-based profiles

Authors:
Hua Xu;Jung-Wei Fan;George Hripcsak;Eneida A. Mendonça;Marianthi Markatou;Carol Friedman
Affiliations:
-;-;-;-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 13

Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks

Artificial Intelligence in Medicine
A system for finding biological entities that satisfy certain conditions from texts

Proceedings of the 17th ACM conference on Information and knowledge management
Knowledge-based gene symbol disambiguation

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Disambiguation of biomedical abbreviations

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
What's in a gene name?: automated refinement of gene name dictionaries

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Exploring Species-Based Strategies for Gene Normalization

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Disambiguation in the biomedical domain: The role of ambiguity type

Journal of Biomedical Informatics
Combining multiple disambiguation methods for gene mention normalization

Expert Systems with Applications: An International Journal
Disambiguation of medline abstracts using topic models

Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Extracting and normalizing gene/protein mentions with the flexible and trainable moara java library

ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
Mixed-sampling approach to unbalanced data distributions: a case study involving Leukemia's document profiling

WSEAS Transactions on Information Science and Applications
Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies

Journal of Biomedical Informatics
Scaling up WSD with automatically generated examples

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: The ambiguity of biomedical entities, particularly of gene symbols, is a big challenge for text-mining systems in the biomedical domain. Existing knowledge sources, such as Entrez Gene and the MEDLINE database, contain information concerning the characteristics of a particular gene that could be used to disambiguate gene symbols. Results: For each gene, we create a profile with different types of information automatically extracted from related MEDLINE abstracts and readily available annotated knowledge sources. We apply the gene profiles to the disambiguation task via an information retrieval method, which ranks the similarity scores between the context where the ambiguous gene is mentioned, and candidate gene profiles. The gene profile with the highest similarity score is then chosen as the correct sense. We evaluated the method on three automatically generated testing sets of mouse, fly and yeast organisms, respectively. The method achieved the highest precision of 93.9% for the mouse, 77.8% for the fly and 89.5% for the yeast. Availability: The testing data sets and disambiguation programs are available at http://www.dbmi.columbia.edu/~hux7002/gsd2006 Contact: friedman@dbmi.columbia.edu