A maximum entropy approach to natural language processing
Computational Linguistics
Automatic Indexing: An Experimental Inquiry
Journal of the ACM (JACM)
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Gene name identification and normalization using a model organism database
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Contrast and variability in gene names
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Human gene name normalization using text matching with automatically extracted synonym dictionaries
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Weakly supervised learning methods for improving the quality of gene name normalization data
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Hi-index | 0.00 |
The aim of gene mention normalization is to propose an appropriate canonical name, or an identifier from a popular database, for a gene or a gene product mentioned in a given piece of text. The task has attracted a lot of research attention for several organisms under the assumption that both the mention boundaries and the target organism are known. Here we extend the task to also recognizing whether the gene mention is valid and to finding the organism it is from. We solve this extended task using a joint model for gene and organism name normalization which allows for instances from different organisms to share features, thus achieving sizable performance gains with different learning methods: Naïve Bayes, Maximum Entropy, Perceptron and MIRA, as well as averaged versions of the last two. The evaluation results for our joint classifier show F1 score of over 97%, which proves the potential of the approach.