The state of retrieval system evaluation
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Corpus-based statistical screening for content-bearing terms
Journal of the American Society for Information Science and Technology
Statistical Language Learning
Machine Learning
Automatically identifying gene/protein terms in MEDLINE abstracts
Journal of Biomedical Informatics
Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Contrast and variability in gene names
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Brief communication: Hidden Markov models and optimized sequence alignments
Computational Biology and Chemistry
Mining semantically related terms from biomedical literature
ACM Transactions on Asian Language Information Processing (TALIP)
@Note: A workbench for Biomedical Text Mining
Journal of Biomedical Informatics
Hi-index | 0.00 |
Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.