Learning of protein interaction networks

  • Authors:
  • Ziv Bar-Joseph;Judith Klein-Seetharaman;Yanjun Qi

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • Learning of protein interaction networks
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Protein-protein interactions (PPI) play a key role in determining the outcome of most cellular processes. Correctly identifying and characterizing protein interactions and the networks they comprise is critical for understanding the molecular mechanisms within the cell.Large-scale biological experimental methods can directly and systematically detect the set of interacting proteins within an organism. Unfortunately, the resulting datasets are often incomplete and exhibit high false positive and false negative rates. In addition to the direct experimental data, a number of large biological datasets also provide indirect evidence about protein-interaction relationships. Thus computational approaches could be utilized to combine multiple information sources in order to predict the sets of interacting protein pairs and identify important biological substructures in this network.In this dissertation, we first carry out a systematic study of the efficacy of using supervised learning methods to integrate direct and indirect biological evidence for predicting pairwise protein interactions. The results indicate that the utility of information, the way the data is encoded as features, the target types of protein interactions and the computational approaches used are all significant for predicting such interactions. We then propose four learning algorithms for deriving PPI networks from different perspectives. (I) A combined computational and experimental approach is proposed for predicting interaction partners of human membrane receptors. The random forest binary classifier is employed to determine if a potential receptor-human pair interacts or not. Biological feedback is used to optimize feature encoding and improve the accuracy of predictions. The resulting receptor PPI network is then analyzed through graph property analysis, graph module identification and protein-family related network pattern search. Several novel predictions are further experimentally validated. Our proposed framework shows that focusing on specific subnetworks generates better predictions. The predicted network provides the most reliable dataset on the network of interactions involving human membrane receptors to date.(II) Considering that PPI networks are highly sparse graphs and there is no large negative reference set (non-interacting pairs) available, we design a ranking approach to identify candidate interaction pairs that are "similar" to known interacting pairs. Robust similarity estimation is especially important here because of high noise rates and the problem of many missing values in biological data. Our ranking method determines the degree of similarity between protein pairs using a trained random forest model. The similarity is, then, used by a weighted k-Nearest-Neighbor algorithm to rank candidate protein pairs. Applying the algorithm on yeast data produces robust performance results that compare favorably with previously suggested methods.(III) A multiple-view learning strategy (referred to as "Mixture of Feature Experts") is further proposed for predicting PPIs that takes into account the heterogeneous nature of feature properties. First, features are split into roughly homogeneous groups. Then, each individual group (called "expert") gives classification opinions and their scores are combined using weighted voting. Different experts have different degrees of influence on the prediction depending on the available features. When applied to yeast and human species, this method improves upon the generally used methods, and the weighting of the experts provides a means to evaluate the prediction based on high scoring feature experts.(IV) "Protein complex" (a special group formation) is one typical pattern contained in protein-protein interaction networks. We present an algorithm for inferring protein complexes based on graph topological patterns and biological properties. Each complex subgraph is modeled by using a probabilistic Bayesian Network. The derived log-likelihood ratio is then used to score subgraphs in the protein interaction graph and to identify new complexes. We apply this method to protein interaction data in yeast. Our algorithm recovers known complexes much better than previous clique-based algorithms.In summary, our proposed algorithms provide strong computational tools for predicting and analyzing protein-protein interaction networks. They have been applied successfully in yeast and human, and have generated promising results. For instance, without the novel interaction between rhodopsin and chemokines found by our computational approach, the important functional implication of rhodopsin in the immune system would not have been possibly discovered.