Identifying Gene and Protein Names from Biological Texts

Authors:
Weijian Xuan;Stanley J. Watson;Huda Akil;Fan Meng
Affiliations:
-;-;-;-
Venue:
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Year:
2003

Citing 3
Cited 1

Fast algorithms for sorting and searching strings

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Contrast and variability in gene names

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3

Tagging Sentence Boundaries in Biomedical Literature

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extracting and identifying gene and protein namesfrom literature is a critical step for mining functionalinformation of genes and proteins. While extensive effortshave been devoted to this important task, most of themwere aiming at extracting genelprotein name per sewithout paying much attention to associate the extractedname with existing gene and protein database entries. Wedeveloped a simple and efficient method to identify geneand protein names in literature using a combination ofheuristic and statistical strategies. Our approach willmap the extracted names to individual LocusLink entriesthus enable the seamless integration of literatureinformation with existing geneiprotein databases.Evaluation on a test corpus shows that our method canachieve both high recall and precision. Our methodexhibits good performance and can be used as a buildingblock for large biomedical literature mining systems.