BioRAT: extracting biological information from full-length papers

Authors:
David P. A. Corney;Bernard F. Buxton;William B. Langdon;David T. Jones
Affiliations:
Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK;Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK;Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK;Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 29

Multi-way relation classification: application to protein-protein interactions

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Enhancing Automatic Construction of Gene Subnetworks by Integrating Multiple Sources of Information

Journal of Signal Processing Systems
Methodological Review: Extracting interactions between proteins from the literature

Journal of Biomedical Informatics
Extraction of protein interaction data: a comparative analysis of methods in use

EURASIP Journal on Bioinformatics and Systems Biology
A General Architecture for Connecting NLP Frameworks and Desktop Clients Using Web Services

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
BioPPIExtractor: A protein-protein interaction extraction system for biomedical literature

Expert Systems with Applications: An International Journal
Semantic Assistants --- User-Centric Natural Language Processing Services for Desktop Clients

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
NLP-Based Curation of Bacterial Regulatory Networks

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Postnominal prepositional phrase attachment in proteomics

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
BioKI:Enzymes: an adaptable system to locate low-frequency information in full-text proteomics articles

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Applying Text Mining to Search for Protein Patterns

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
IntEx: a syntactic role driven protein-protein interaction extractor for bio-medical text

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Postnominal prepositional phrase attachment in proteomics

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
BioKI:Enzymes: an adaptable system to locate low-frequency information in full-text proteomics articles

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
BioPPISVMExtractor: A protein-protein interaction extractor for biomedical literature using SVM and rich feature sets

Journal of Biomedical Informatics
Reconstruction of protein-protein interaction pathways by mining subject-verb-objects intermediates

PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Using lexical, terminological and ontological resources for entity recognition tasks in the medical domain

AIME'07 Proceedings of the 2007 conference on Knowledge management for health care procedures
Automatic extraction of kinetic information from biochemical literatures

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Corpus study of kidney-related experimental data in scientific papers

WBIE '09 Proceedings of the Workshop on Biomedical Information Extraction
Multiple kernel learning in protein-protein interaction extraction from biomedical literature

Artificial Intelligence in Medicine
Unsupervised relation extraction using dependency trees for automatic generation of multiple-choice questions

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Semantic text mining for lignocellulose research

Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Combining biological databases and text mining to support new bioinformatics applications

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Collaborative curation of data from bio-medical texts and abstracts and its integration

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Information Extraction Approaches to Unconventional Data Sources for "Injury Surveillance System": the Case of Newspapers Clippings

Journal of Medical Systems
Hash Subgraph Pairwise Kernel for Protein-Protein Interaction Extraction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Using a shallow linguistic kernel for drug-drug interaction extraction

Journal of Biomedical Informatics
NLP@Desktop: a service oriented architecture for integrating NLP services in desktop clients

ACM SIGSOFT Software Engineering Notes

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. Results: We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers. Availability: The software and documentation can be found at http://bioinf.cs.ucl.ac.uk/biorat