GAPSCORE: finding gene and protein names one word at a time

Authors:
Jeffrey T. Chang;Hinrich Schütze;Russ B. Altman
Affiliations:
Department of Genetics, Stanford Medical Center, 300 Pasteur Drive, Lane L 301, Mail Code 5120, Stanford, CA 94305-5120, USA;Enkata Technologies, 2121 South El Camino Real, Suite 1200 San Mateo, CA 94403-1855, USA;Department of Genetics, Stanford Medical Center, 300 Pasteur Drive, Lane L 301, Mail Code 5120, Stanford, CA 94305-5120, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 14

Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification

Journal of Biomedical Informatics
Identification of gene function using prediction by partial matching (PPM) language models

Proceedings of the 17th ACM conference on Information and knowledge management
Support vector machine approach to extracting gene references into function from biological documents

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Annotating multiple types of biomedical entities: a single word classification approach

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
How to make the most of NE dictionaries in statistical NER

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Unsupervised gene/protein named entity normalization using automatically extracted dictionaries

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
LITSEEK: public health literature search by metadata enhancement with external knowledge bases

Proceedings of the third international workshop on Data and text mining in bioinformatics
MaxMatcher: biological concept extraction using approximate dictionary lookup

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Combining context and existing knowledge when recognizing biological entities: early results

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Automatic extraction of kinetic information from biochemical literatures

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Knowledge element extraction for knowledge-based learning resources organization

ICWL'07 Proceedings of the 6th international conference on Advances in web based learning
Systematic identification of pharmacogenomics information from clinical trials

Journal of Biomedical Informatics
Bootstrapping biomedical ontologies for scientific text using NELL

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Topic-Oriented words as features for named entity recognition

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	3.85

Visualization

Abstract

Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context. Results: We evaluated GAPSCORE against the Yapex data set and achieved an F-score of 82.5% (83.3% recall, 81.5% precision) for partial matches and 57.6% (58.5% recall, 56.7% precision) for exact matches. Since the method is statistical, users can choose score cutoffs that adjust the performance according to their needs. Availability: GAPSCORE is available at http://bionlp.stanford.edu/gapscore/