Integrating multi-attribute similarity networks for robust representation of the protein space

  • Authors:
  • Orhan Çamoğlu;Tolga Can;Ambuj K. Singh

  • Affiliations:
  • Department of Computer Science, University of California Santa Barbara, CA 93106, USA;Department of Computer Engineering, Middle East Technical University 06531, Ankara, Turkey;Department of Computer Science, University of California Santa Barbara, CA 93106, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. Results: We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach. Availability: Source code is available upon request from the primary author. Contact: orhan@cs.ucsb.edu Supplementary Information: Supplementary data are available on Bioinformatic online.