Protein complex prediction via cost-based clustering

  • Authors:
  • A. D. King;N. Pržulj;I. Jurisica

  • Affiliations:
  • Department of Computer Science, University of Toronto, Toronto, M5S 3G4, Canada;Department of Computer Science, University of Toronto, Toronto, M5S 3G4, Canada;Department of Computer Science, University of Toronto, Toronto, M5S 3G4, Canada

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Understanding principles of cellular organization and function can be enhanced if we detect known and predict still undiscovered protein complexes within the cell's protein--protein interaction (PPI) network. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitates an accurate and scalable approach to protein complex identification. Results: We have developed the Restricted Neighborhood Search Clustering Algorithm (RNSC) to efficiently partition networks into clusters using a cost function. We applied this cost-based clustering algorithm to PPI networks of Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans to identify and predict protein complexes. We have determined functional and graph-theoretic properties of true protein complexes from the MIPS database. Based on these properties, we defined filters to distinguish between identified network clusters and true protein complexes. Conclusions: Our application of the cost-based clustering algorithm provides an accurate and scalable method of detecting and predicting protein complexes within a PPI network. Availability: The RNSC algorithm and data processing code are available upon request from the authors. Supplementary Information: Supplementary data are available at http://www.cs.utoronto.ca/~juris/data/ppi04/