Extraction of regulatory gene/protein networks from Medline

Authors:
Jasmin Šarić;Lars Juhl Jensen;Rossitza Ouzounova;Isabel Rojas;Peer Bork
Affiliations:
EML Research gGmbH D-69118 Heidelberg, Germany;European Molecular Biology Laboratory D-69117 Heidelberg, Germany;European Molecular Biology Laboratory D-69117 Heidelberg, Germany;EML Research gGmbH D-69118 Heidelberg, Germany;European Molecular Biology Laboratory D-69117 Heidelberg, Germany
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 16

BioCAD: an information fusion platform for bio-network inference and analysis

TMBIO '06 Proceedings of the 1st international workshop on Text mining in bioinformatics
High-performance information extraction with AliBaba

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
BioNoculars: extracting protein-protein interactions from biomedical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Using UMLS to construct a generalized hierarchical concept-based dictionary of brain functions for information extraction from the fMRI literature

Journal of Biomedical Informatics
Assigning roles to protein mentions: The case of transcription factors

Journal of Biomedical Informatics
BioProber2.0: a unified biomedical workbench with mining and probing literatures

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles

Journal of Biomedical Informatics
Measuring prediction capacity of individual verbs for the identification of protein interactions

Journal of Biomedical Informatics
Multi-way association extraction and visualization from biological text documents using hyper-graphs: Applications to genetic association studies for diseases

Artificial Intelligence in Medicine
Event extraction for post-translational modifications

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Using text to build semantic networks for pharmacogenomics

Journal of Biomedical Informatics
Methodological Review: Natural Language Processing methods and systems for biomedical ontology learning

Journal of Biomedical Informatics
Towards automatic thematic sheets based on discursive categories in biomedical literature

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Towards exhaustive protein modification event extraction

BioNLP '11 Proceedings of BioNLP 2011 Workshop
A hybrid approach to gene ranking using gene relation networks derived from literature for the identification of disease gene markers

International Journal of Data Mining and Bioinformatics
Ontology-based information extraction of regulatory networks from scientific articles with case studies for Escherichia coli

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: We have previously developed a rule-based approach for extracting information on the regulation of gene expression in yeast. The biomedical literature, however, contains information on several other equally important regulatory mechanisms, in particular phosphorylation, which we now expanded for our rule-based system also to extract. Results: This paper presents new results for extraction of relational information from biomedical text. We have improved our system, STRING-IE, to capture both new types of linguistic constructs as well as new types of biological information [i.e. (de-)phosphorylation]. The precision remains stable with a slight increase in recall. From almost one million PubMed abstracts related to four model organisms, we manage to extract regulatory networks and binary phosphorylations comprising 3319 relation chunks. The accuracy is 83--90% and 86--95% for gene expression and (de-)phosphorylation relations, respectively. To achieve this, we made use of an organism-specific resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. These names were included in the lexicon when retraining the part-of-speech (POS) tagger on the GENIA corpus. For the domain in question, an accuracy of 96.4% was attained on POS tags. It should be noted that the rules were developed for yeast and successfully applied to both abstracts and full-text articles related to other organisms with comparable accuracy. Availability: The revised GENIA corpus, the POS tagger, the extraction rules and the full sets of extracted relations are available from http://www.bork.embl.de/Docu/STRING-IE Contact: saric@eml-r.org