Community Mining from Signed Social Networks
IEEE Transactions on Knowledge and Data Engineering
Scalable graph clustering using stochastic flows: applications to community discovery
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Bioinformatics
Markov clustering of protein interaction networks with improved balance and scalability
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Local graph sparsification for scalable clustering
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Protein complexes are key units to discover protein mechanism. Traditional protein complex identification methods adopt a soft (overlapping) network clustering algorithm on protein-protein interaction network and predict the clusters as protein complexes. Recently, the AP-MS technique and the scoring method can measure the co-complex relationship among proteins. Unlike traditional PPI networks, AP-MS can provide negative evidence which indicates which proteins are unlikely to be in the same protein complex. However, most of existing network clustering algorithms cannot utilize this negative similarity score. In this paper, we propose a soft network clustering algorithm, SR-MCL-N, which can take into account negative similarity scores. SR-MCL-N is a variation of a previous algorithm, SR-MCL, which is a network clustering algorithm based on the transition flow. Additionally, since the scoring approach we use produces a dense similarity matrix, a sparsification technique is adopted on the similarity matrix. Based on the gold standard CYC2008 and GO terms, we first show that the sparsification can not only speed up SR-MCL-N, but also let SR-MCL-N generate more accurate clusters. SR-MCL-N is then compared against SR-MCL and a hierarchical algorithm which also considers negative similarity score. The results indicate that our algorithm outperforms others since SR-MCL-N not only generates overlapped clusters but also additionally takes negative similarity score into account.