A joint model for normalizing gene and organism mentions in text

Authors:
Georgi Georgiev;Preslav Nakov;Kuzman Ganchev;Deyan Peychev;Vassil Momchev
Affiliations:
Ontotext AD, Sofia, Bulgaria;National University of Singapore, Singapore;University of Pennsylvania, Philadelphia, PA;Ontotext AD, Sofia, Bulgaria;Ontotext AD, Sofia, Bulgaria
Venue:
WBIE '09 Proceedings of the Workshop on Biomedical Information Extraction
Year:
2009

Citing 11
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Gene name identification and normalization using a model organism database

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Contrast and variability in gene names

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Human gene name normalization using text matching with automatically extracted synonym dictionaries

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Weakly supervised learning methods for improving the quality of gene name normalization data

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Efficiently inducing features of conditional random fields

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The aim of gene mention normalization is to propose an appropriate canonical name, or an identifier from a popular database, for a gene or a gene product mentioned in a given piece of text. The task has attracted a lot of research attention for several organisms under the assumption that both the mention boundaries and the target organism are known. Here we extend the task to also recognizing whether the gene mention is valid and to finding the organism it is from. We solve this extended task using a joint model for gene and organism name normalization which allows for instances from different organisms to share features, thus achieving sizable performance gains with different learning methods: Naïve Bayes, Maximum Entropy, Perceptron and MIRA, as well as averaged versions of the last two. The evaluation results for our joint classifier show F1 score of over 97%, which proves the potential of the approach.