Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A shallow parser based on closed-class words to capture relations in biomedical text
Journal of Biomedical Informatics
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data
Journal of Biomedical Informatics
Resolving abbreviations to their senses in Medline
Bioinformatics
Extraction of regulatory gene/protein networks from Medline
Bioinformatics
Text processing through Web services
Bioinformatics
NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
Comparative experiments on learning information extractors for proteins and their interactions
Artificial Intelligence in Medicine
Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Semantics-aware open information extraction in the biomedical domain
Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences
Hi-index | 0.00 |
Motivation: The identification of events such as protein-protein interactions (PPIs) from the scientific literature is a complex task. One of the reasons is that there is no formal syntax to denote such relations in the scientific literature. Nonetheless, it is important to understand such relational event representations to improve information extraction solutions (e.g., for gene regulatory events). In this study, we analyze publicly available protein interaction corpora (AIMed, BioInfer, BioCreAtIve II) to determine the scope of verbs used to denote protein interactions and to measure their predictive capacity for the identification of PPI events. Our analysis is based on syntactical language patterns. This restriction has the advantage that the verb mention is used as the independent variable in the experiments enabling comparability of results in the usage of the verbs. The initial selection of verbs has been generated from a systematic analysis of the scientific literature and existing corpora for PPIs. We distinguish modifying interactions (MIs) such as posttranslational modifications (PTMs) from non-modifying interactions (NMIs) and assumed that MIs have a higher predictive capacity due to stronger scientific evidence proving the interaction. We found that MIs are less frequent in the corpus but can be extracted at the same precision levels as PPIs. A significant portion of correct PPI reportings in the BioCreAtIve II corpus use the verb ''associate'', which semantically does not prove a relation. The performance of every monitored verb is listed and allows the selection of specific verbs to improve the performance of PPI extraction solutions. Programmatic access to the text processing modules is available online (www.ebi.ac.uk/webservices/whatizit/info.jsf) and the full analysis of Medline abstracts will be made through the Web pages of the Rebholz group.