A priority model for named entities

Authors:
Lorraine Tanabe;W. John Wilbur
Affiliations:
National Center for Biotechnology Information, Bethesda, MD;National Center for Biotechnology Information, Bethesda, MD
Venue:
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Year:
2006

Citing 5
Cited 0

Statistical Language Learning

Statistical Language Learning
AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Probabilistic representation of formal languages

SWAT '69 Proceedings of the 10th Annual Symposium on Switching and Automata Theory (swat 1969)
MedTag: a collection of biomedical annotations

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new approach to named entity classification which we term a Priority Model. We also describe the construction of a semantic database called SemCat consisting of a large number of semantically categorized names relevant to biomedicine. We used SemCat as training data to investigate name classification techniques. We generated a statistical language model and probabilistic context-free grammars for gene and protein name classification, and compared the results with the new model. For all three methods, we used a variable order Markov model to predict the nature of strings not represented in the training data. The Priority Model achieves an F-measure of 0.958--0.960, consistently higher than the statistical language model and probabilistic context-free grammar.