A systematic comparison of genome scale clustering algorithms

Authors:
Jeremy J. Jay;John D. Eblen;Yun Zhang;Mikael Benson;Andy D. Perkins;Arnold M. Saxton;Brynn H. Voy;Elissa J. Chesler;Michael A. Langston
Affiliations:
The Jackson Laboratory, Bar Harbor ME;University of Tennessee, Knoxville TN;University of Tennessee, Knoxville TN;University of Göteborg, Göteborg, Sweden;Mississippi State University, Mississippi State MS;University of Tennessee, Knoxville TN;University of Tennessee, Knoxville TN;The Jackson Laboratory, Bar Harbor ME;University of Tennessee, Knoxville TN
Venue:
ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
Year:
2011

Citing 11
Cited 0

Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
GOstat: find statistically overrepresented Gene Ontologies within a group of genes

Bioinformatics
Computational cluster validation in post-genomic data analysis

Bioinformatics
Ontological analysis of gene expression data: current tools, limitations, and open problems

Bioinformatics
Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Scalable Parallel Algorithms for FPT Problems

Algorithmica
Evaluation and comparison of gene clustering methods in microarray analysis

Bioinformatics
Techniques for clustering gene expression data

Computers in Biology and Medicine
Combinatorial genetic regulatory network analysis tools for high throughput transcriptomic data

RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
The cluster editing problem: implementations and experiments

IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wealth of clustering algorithms has been applied to gene coexpression experiments. These algorithms cover a broad array of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray data that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae.Clusters are scored using Jaccard similarity coefficients for the analysis of the positive match of clusters to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.