A systematic comparison of genome scale clustering algorithms

  • Authors:
  • Jeremy J. Jay;John D. Eblen;Yun Zhang;Mikael Benson;Andy D. Perkins;Arnold M. Saxton;Brynn H. Voy;Elissa J. Chesler;Michael A. Langston

  • Affiliations:
  • The Jackson Laboratory, Bar Harbor ME;University of Tennessee, Knoxville TN;University of Tennessee, Knoxville TN;University of Göteborg, Göteborg, Sweden;Mississippi State University, Mississippi State MS;University of Tennessee, Knoxville TN;University of Tennessee, Knoxville TN;The Jackson Laboratory, Bar Harbor ME;University of Tennessee, Knoxville TN

  • Venue:
  • ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A wealth of clustering algorithms has been applied to gene coexpression experiments. These algorithms cover a broad array of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray data that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae.Clusters are scored using Jaccard similarity coefficients for the analysis of the positive match of clusters to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.