Rutabaga by any other name: extracting biological names

Authors:
Lynette Hirschman;Alexander A. Morgan;Alexander S. Yeh
Affiliations:
The MITRE Corporation, MS K312, 202 Burlington Rd., Bedford, MA;The MITRE Corporation, MS K312, 202 Burlington Rd., Bedford, MA;The MITRE Corporation, MS K312, 202 Burlington Rd., Bedford, MA
Venue:
Journal of Biomedical Informatics - Special issue: Sublanguage
Year:
2002

Citing 9
Cited 15

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Natural language question answering: the view from here

Natural Language Engineering
Using corpus-derived name lists for named entity recognition

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Transparent access to multiple bioinformatics information sources

IBM Systems Journal - Deep computing for the life sciences
Overview of results of the MUC-6 evaluation

MUC6 '95 Proceedings of the 6th conference on Message understanding
Comparison between tagged corpora for the named entity task

WCC '00 Proceedings of the workshop on Comparing corpora - Volume 9

GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

Journal of Biomedical Informatics
Gene name identification and normalization using a model organism database

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Gene name extraction using FlyBase resources

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
A Grid-Based Pseudo-Cache solution for MISD biomedical problems with high confidentiality and efficiency

International Journal of Bioinformatics Research and Applications
Tasks, topics and relevance judging for the TREC Genomics Track: five years of experience evaluating biomedical text information retrieval systems

Information Retrieval
Rule-Based Protein Term Identification with Help from Automatic Species Tagging

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A preliminary look into the use of named entity information for bioscience text tokenization

HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Unsupervised gene/protein named entity normalization using automatically extracted dictionaries

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Database Note: iProLINK: an integrated protein resource for literature mining

Computational Biology and Chemistry
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Issues on quality assessment of SNOMED CT® subsets: term validation and term extraction

WBIE '09 Proceedings of the Workshop on Biomedical Information Extraction
Recognizing medication related entities in hospital discharge summaries using support vector machine

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Boosting performance of gene mention tagging system by hybrid methods

Journal of Biomedical Informatics
Combining information extraction and text mining for cancer biomarker detection

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.