Multi-label correlated semi-supervised learning for protein function prediction

  • Authors:
  • Jonathan Q. Jiang

  • Affiliations:
  • Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong

  • Venue:
  • ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The advent of large volume of molecular interactions has led to the emergence of a considerable number of computational approaches for studying protein function in the context of network. These algorithms, however, treat each functional class independently and thereby suffer from a difficulty of assigning multiple functions to a protein simultaneously. We propose here a new semi-supervised algorithm, called MCSL, by considering the correlations among functional categories which improves the performance significantly. The guiding intuition is that a protein can receive label information not only from its neighbors annotated with the same category in functional-linkage network, but also from its partners labeled with other classes in category network if their respective neighborhood topologies are a good match. We encode this intuition as a two-dimensional version of network-based learning with local and global consistency. Experiments on a Saccharomyces cerevisiae protein-protein interaction network show that our algorithm can achieve superior performance compared with four state-of-the-art methods by 5-fold cross validation with 66 second-level and 77 informative MIPS functional categories respectively. Furthermore, we make predictions for the 204 uncharacterized proteins and most of these assignments could be directly found in or indirectly inferred from SGD database.