Comparison of modularization methods in application to different biological networks

Authors:
Zhuo Wang;Xin-Guang Zhu;Yazhu Chen;Yixue Li;Lei Liu
Affiliations:
Biomedical Instrument Institute, Shanghai Jiao Tong University, Shanghai, China;Department of Plant Biology, University of Illinois at Urbana-Champaign, Illinois;Biomedical Instrument Institute, Shanghai Jiao Tong University, Shanghai, China;Shanghai Center for Bioinformation Technology, Shanghai, China;The W. M. Keck Center for Comparative and Functional Genomics, University of Illinois at Urbana-Champaign, Illinois
Venue:
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Year:
2006

Citing 1
Cited 0

On clusterings-good, bad and spectral

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most biological networks have been proposed to possess modular organization, which increases the robustness, flexibility, and stability of networks. Many clustering methods have been used in mining biological data and partitioning complex networks into functional modules. Most of these methods require presetting the number of modules and therefore can potentially obtain biased results. The Markov clustering method (MCL) and the simulated annealing module-detection method (SA) eliminate this requirement and can objectively separate relatively dense subgraphs. In this paper, we compared these two module-detection methods for three types of biological data: protein family classification, microarray clustering, and modularity of metabolic networks. We found that these two methods show differential advantages for different biological networks. In the case of the gene network based on Affymetrix microarray spike data, MCL exactly identified the same number of groups and same contents in each group set by the spike data. In the case of the gene network derived from actual expression data, although neither of the two methods can perfectly recover the natural classification, MCL performs slightly better than SA. However, with increased random noise added to the gene expression values, SA generates better modular structures with higher modularity. Next we compared the modularization results of MCL and SA for protein family classification and found the modules detected by SA could not be well matched with the Structural Classification of Proteins (SCOP database), which suggests that MCL is ideally suited to the rapid and accurate detection of protein families. In addition, we used both methods to detect modules in the metabolic network of E. coli. MCL gives a trivial clustering, which generates biologically insignificant modules. In contrast, SA detects modules well corresponding to the KEGG functional classification. Moreover the modularity for several other metabolic networks detected by SA is also much higher than that by MCL. In summary, MCL is more suited to modularize relatively complete and definite data, such as a protein family network. In contrast, SA is less sensitive to noise such as experimental error or incomplete data and outperforms MCL when modularizing gene networks based on microarray data and large scale metabolic networks constructed from incomplete databases.