An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Solving cluster ensemble problems by bipartite graph partitioning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient Modularization of Weighted Protein Interaction Networks using k-Hop Graph Reduction
BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
A tutorial on spectral clustering
Statistics and Computing
SIAM Journal on Matrix Analysis and Applications
Non-negative matrix factorization for semi-supervised data clustering
Knowledge and Information Systems
flowNet: Flow-Based Approach for Efficient Analysis of Complex Biological Networks
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Detecting functional modules from protein-protein interaction (PPI) networks is an active research area with many practical applications. However, there is always a critical concern on the false PPI interactions which are derived from the high-throughput experiments and the unsatisfactory results obtained from single PPI network with severe information insufficiency. To address this problem, we propose a Collective Non-negative Matrix Factorization (CoNMF) based soft clustering method which efficiently integrates information of gene ontology (GO), gene expression data and PPI networks. In our method, the three data sources are formed into two graphs with similarity adjacency matrices and these graphs are approximated by a matrix factorization with their common factor which provides the straight-forward interpretation of clustering results. Extensive experiments show that we can improve the module detection performance by integrating multiple biological data sources and that CoNMF yields superior results compared to other multiple data sources fusion methods by identifying a larger number of more precise protein modules with actual biological meaning and certain degree of overlapping.