Clustering algorithms optimizer: a framework for large datasets

Authors:
Roy Varshavsky;David Horn;Michal Linial
Affiliations:
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel;School of Physics and Astronomy, Tel Aviv University, Israel;Deptartment of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
Venue:
ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Year:
2007

Citing 8
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Adaptive dimension reduction for clustering high dimensional data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A unified framework for model-based clustering

The Journal of Machine Learning Research
Meanshift Clustering for DNA Microarray Analysis

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Computational cluster validation in post-genomic data analysis

Bioinformatics
A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data

Bioinformatics
COMPACT: a comparative package for clustering assessment

ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a data-driven framework that includes two interrelated steps. The first one is SVD-based dimension reduction and the second is an automated tuning of the algorithm's parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.