Journal of Biomedical Informatics
Identification of gene function using prediction by partial matching (PPM) language models
Proceedings of the 17th ACM conference on Information and knowledge management
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Annotating multiple types of biomedical entities: a single word classification approach
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
How to make the most of NE dictionaries in statistical NER
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Unsupervised gene/protein named entity normalization using automatically extracted dictionaries
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
LITSEEK: public health literature search by metadata enhancement with external knowledge bases
Proceedings of the third international workshop on Data and text mining in bioinformatics
MaxMatcher: biological concept extraction using approximate dictionary lookup
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Combining context and existing knowledge when recognizing biological entities: early results
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Automatic extraction of kinetic information from biochemical literatures
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Knowledge element extraction for knowledge-based learning resources organization
ICWL'07 Proceedings of the 6th international conference on Advances in web based learning
Systematic identification of pharmacogenomics information from clinical trials
Journal of Biomedical Informatics
Bootstrapping biomedical ontologies for scientific text using NELL
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Topic-Oriented words as features for named entity recognition
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 3.85 |
Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context. Results: We evaluated GAPSCORE against the Yapex data set and achieved an F-score of 82.5% (83.3% recall, 81.5% precision) for partial matches and 57.6% (58.5% recall, 56.7% precision) for exact matches. Since the method is statistical, users can choose score cutoffs that adjust the performance according to their needs. Availability: GAPSCORE is available at http://bionlp.stanford.edu/gapscore/