A priority model for named entities

Authors:
Lorraine Tanabe;W. John Wilbur
Affiliations:
National Center for Biotechnology Information, Bethesda, MD;National Center for Biotechnology Information, Bethesda, MD
Venue:
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Year:
2006

Citing 5
Cited 3

Statistical Language Learning

Statistical Language Learning
AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Probabilistic representation of formal languages

SWAT '69 Proceedings of the 10th Annual Symposium on Switching and Automata Theory (swat 1969)
MedTag: a collection of biomedical annotations

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

Exploring two biomedical text genres for disease recognition

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Unsupervised mapping of sentences to biomedical concepts based on integrated information retrieval model and clustering

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Classifying gene sentences in biomedical literature by combining high-precision gene identifiers

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new approach to named entity classification which we term a Priority Model. We also describe the construction of a semantic database called SemCat consisting of a large number of semantically categorized names relevant to biomedicine. We used SemCat as training data to investigate name classification techniques. We generated a statistical language model and probabilistic context-free grammars for gene and protein name classification, and compared the results with the new model. For all three methods, we used a variable order Markov model to predict the nature of strings not represented in the training data. The Priority Model achieves an F-measure of 0.958--0.960, consistently higher than the statistical language model and probabilistic context-free grammar.