gProt: annotating protein interactions using google and gene ontology

  • Authors:
  • Rune Sætre;Amund Tveit;Martin Thorsen Ranang;Tonje S. Steigedal;Liv Thommesen;Kamilla Stunes;Astrid Lægreid

  • Affiliations:
  • Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway;Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway;Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway;Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway

  • Venue:
  • KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the increasing amount of biomedical literature, there is a need for automatic extraction of information to support biomedical researchers. Due to incomplete biomedical information databases, the extraction cannot be done straightforward using dictionaries, so several approaches using contextual rules and machine learning have previously been proposed. Our work is inspired by the previous approaches, but is novel in the sense that it combines Google and Gene Ontology for annotating protein interactions. We got promising empirical results – 57.5% terms as valid GO annotations, and 16.9% protein names in the answers provided by our system gProt. The total error-rate was 25.6% consisting mainly of overly general answers and syntactic errors, but also including semantic errors, other biological entities (than proteins and GO-terms) and false information sources.