Measuring prediction capacity of individual verbs for the identification of protein interactions

Authors:
Dietrich Rebholz-Schuhmann;Antonio Jimeno-Yepes;Miguel Arregui;Harald Kirsch
Affiliations:
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK;European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK;European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK;European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Venue:
Journal of Biomedical Informatics
Year:
2010

Citing 13
Cited 2

Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A shallow parser based on closed-class words to capture relations in biomedical text

Journal of Biomedical Informatics
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

Journal of Biomedical Informatics
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
Resolving abbreviations to their senses in Medline

Bioinformatics
Implementing the iHOP concept for navigation of biomedical literature

Bioinformatics
BioThesaurus: a web-based thesaurus of protein and gene names

Bioinformatics
Extraction of regulatory gene/protein networks from Medline

Bioinformatics
EBIMed---text crunching to gather facts for proteins from Medline

Bioinformatics
Learning string similarity measures for gene/protein name dictionary look-up using logistic regression

Bioinformatics
Text processing through Web services

Bioinformatics
Annotation and disambiguation of semantic types in biomedical text: a cascaded approach to named entity recognition

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine

Efficient Extraction of Protein-Protein Interactions from Full-Text Articles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Semantics-aware open information extraction in the biomedical domain

Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Motivation: The identification of events such as protein-protein interactions (PPIs) from the scientific literature is a complex task. One of the reasons is that there is no formal syntax to denote such relations in the scientific literature. Nonetheless, it is important to understand such relational event representations to improve information extraction solutions (e.g., for gene regulatory events). In this study, we analyze publicly available protein interaction corpora (AIMed, BioInfer, BioCreAtIve II) to determine the scope of verbs used to denote protein interactions and to measure their predictive capacity for the identification of PPI events. Our analysis is based on syntactical language patterns. This restriction has the advantage that the verb mention is used as the independent variable in the experiments enabling comparability of results in the usage of the verbs. The initial selection of verbs has been generated from a systematic analysis of the scientific literature and existing corpora for PPIs. We distinguish modifying interactions (MIs) such as posttranslational modifications (PTMs) from non-modifying interactions (NMIs) and assumed that MIs have a higher predictive capacity due to stronger scientific evidence proving the interaction. We found that MIs are less frequent in the corpus but can be extracted at the same precision levels as PPIs. A significant portion of correct PPI reportings in the BioCreAtIve II corpus use the verb ''associate'', which semantically does not prove a relation. The performance of every monitored verb is listed and allows the selection of specific verbs to improve the performance of PPI extraction solutions. Programmatic access to the text processing modules is available online (www.ebi.ac.uk/webservices/whatizit/info.jsf) and the full analysis of Medline abstracts will be made through the Web pages of the Rebholz group.