Predicting protein-protein interactions from multimodal biological data sources via nonnegative matrix tri-factorization

Authors:
Hua Wang;Heng Huang;Chris Ding;Feiping Nie
Affiliations:
Department of Computer Science and Engineering, University of Texas, Arlington, TX;Department of Computer Science and Engineering, University of Texas, Arlington, TX;Department of Computer Science and Engineering, University of Texas, Arlington, TX;Department of Computer Science and Engineering, University of Texas, Arlington, TX
Venue:
RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
Year:
2012

Citing 12
Cited 0

Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Kernel methods for predicting protein--protein interactions

Bioinformatics
Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps

Bioinformatics
Predicting protein--protein interactions using signature products

Bioinformatics
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Prediction of protein--protein interactions using random decision forest framework

Bioinformatics
A structural alignment kernel for protein structures

Bioinformatics
Non-negative Matrix Factorization on Manifold

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Sequence-based prediction of protein interaction sites with an integrative method

Bioinformatics
Co-clustering on manifolds

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Non-negative Laplacian Embedding

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the high false positive rate in the high-throughput experimental methods to discover protein interactions, computational methods are necessary and crucial to complete the interactome expeditiously. However, when building classification models to identify putative protein interactions, compared to the obvious choice of positive samples from truly interacting protein pairs, it is usually very hard to select negative samples, because non-interacting protein pairs refer to those currently without experimental or computational evidence to support a physical interaction or a functional association, which, though, could interact in reality. To tackle this difficulty, instead of using heuristics as in many existing works, in this paper we solve it in a principled way by formulating the protein interaction prediction problem from a new mathematical perspective of view - sparse matrix completion, and propose a novel Nonnegative Matrix Tri-Factorization (NMTF) based matrix completion approach to predict new protein interactions from existing protein interaction networks. Because matrix completion only requires positive samples but not use negative samples, the challenge in existing classification based methods for protein interaction prediction is circumvented. Through using manifold regularization, we further develop our method to integrate different biological data sources, such as protein sequences, gene expressions, protein structure information, etc. Extensive experimental results on Saccharomyces cerevisiae genome show that our new methods outperform related state-of-the-art protein interaction prediction methods.