An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Constructing Biological Knowledge Bases by Extracting Information from Text Sources
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Rutabaga by any other name: extracting biological names
Journal of Biomedical Informatics - Special issue: Sublanguage
Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Tuning support vector machines for biomedical named entity recognition
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Contrast and variability in gene names
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Medstract: creating large-scale information servers for biomedical libraries
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Protein name tagging for biomedical annotation in text
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Introduction: named entity recognition in biomedicine
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
A stopping criterion for active learning
Computer Speech and Language
Rule-Based Protein Term Identification with Help from Automatic Species Tagging
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Human gene name normalization using text matching with automatically extracted synonym dictionaries
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Semi-supervised anaphora resolution in biomedical texts
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Summarizing key concepts using citation sentences
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Bootstrapping and evaluating named entity recognition in the biomedical domain
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Accelerating the annotation of sparse named entities by dynamic sentence selection
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Annotation of chemical named entities
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Evaluating and combining biomedical named entity recognition systems
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Natural Language Processing as a Foundation of the Semantic Web
Foundations and Trends in Web Science
Weakly supervised learning methods for improving the quality of gene name normalization data
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Adaptive string similarity metrics for biomedical reference resolution
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Corpus design for biomedical natural language processing
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Human gene name normalization using text matching with automatically extracted synonym dictionaries
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Semi-supervised anaphora resolution in biomedical texts
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Summarizing key concepts using citation sentences
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Bootstrapping and evaluating named entity recognition in the biomedical domain
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Distant supervision for relation extraction without labeled data
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
A joint model for normalizing gene and organism mentions in text
WBIE '09 Proceedings of the Workshop on Biomedical Information Extraction
Modeling relations and their mentions without labeled text
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
A bootstrapping approach for training a NER with conditional random fields
EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Evaluating semantic evaluations: how RTE measures up
MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Hi-index | 0.00 |
Biology has now become an information science, and researchers are increasingly dependent on expert-curated biological databases to organize the findings from the published literature. We report here on a series of experiments related to the application of natural language processing to aid in the curation process for FlyBase. We focused on listing the normalized form of genes and gene products discussed in an article. We broke this into two steps: gene mention tagging in text, followed by normalization of gene names. For gene mention tagging, we adopted a statistical approach. To provide training data, we were able to reverse engineer the gene lists from the associated articles and abstracts, to generate text labeled (imperfectly) with gene mentions. We then evaluated the quality of the noisy training data (precision of 78%, recall 88%) and the quality of the HMM tagger output trained on this noisy data (precision 78%, recall 71%). In order to generate normalized gene lists, we explored two approaches. First, we explored simple pattern matching based on synonym lists to obtain a high recall/low precision system (recall 95%, precision 2%). Using a series of filters, we were able to improve precision to 50% with a recall of 72% (balanced F-measure of 0.59). Our second approach combined the HMM gene mention tagger with various filters to remove ambiguous mentions; this approach achieved an F-measure of 0.72 (precision 88%, recall 61%). These experiments indicate that the lexical resources provided by FlyBase are complete enough to achieve high recall on the gene list task, and that normalization requires accurate disambiguation; different strategies for tagging and normalization trade off recall for precision.