Statistical Language Learning
AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Probabilistic representation of formal languages
SWAT '69 Proceedings of the 10th Annual Symposium on Switching and Automata Theory (swat 1969)
MedTag: a collection of biomedical annotations
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Corpus design for biomedical natural language processing
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Exploring two biomedical text genres for disease recognition
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Classifying gene sentences in biomedical literature by combining high-precision gene identifiers
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Hi-index | 0.00 |
We introduce a new approach to named entity classification which we term a Priority Model. We also describe the construction of a semantic database called SemCat consisting of a large number of semantically categorized names relevant to biomedicine. We used SemCat as training data to investigate name classification techniques. We generated a statistical language model and probabilistic context-free grammars for gene and protein name classification, and compared the results with the new model. For all three methods, we used a variable order Markov model to predict the nature of strings not represented in the training data. The Priority Model achieves an F-measure of 0.958--0.960, consistently higher than the statistical language model and probabilistic context-free grammar.