Mining protein-protein interactions from GeneRIFs with OpenDMAP

Authors:
Andrew D. Fox;William A. Baumgartner;Helen L. Johnson;Lawrence E. Hunter;Donna K. Slonim
Affiliations:
Department of Computer Science, Tufts University, Medford, MA;Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, CO;Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, CO;Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, CO;Department of Computer Science, Tufts University, Medford, MA
Venue:
ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
Year:
2009

Citing 3
Cited 0

Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
Manual curation is not sufficient for annotation of genomic databases

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We applied the OpenDMAP [1] and BioNLP-UIMA [2] NLP systems to the task of mining protein-protein interactions (PPIs) from GeneRIFs. Our goal was to assess and improve system performance on GeneRIF text. We identified several classes of errors in the system's output on a training dataset (most notably difficulty recognizing protein complexes) and modified the system to improve performance based on these observations. To improve recognition of protein complex interactions, we implemented a new protein-complex-resolution UIMA component. We added a custom entity identification engine that uses GeneRIF metadata to annotate proteins that may have been missed by the other engines. These changes simultaneously improved both recall and precision, resulting in an overall improvement in F-measure (from 0.23 to 0.48). Results confirm that the targeted enhancements described here lead to a substantial improvement in performance. Availability: Annotated data sets and source code for the new UIMA components can be found at http://bcb.cs.tufts.edu/GeneRIFs/