Neural Computation
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Bayesian hierarchical clustering
ICML '05 Proceedings of the 22nd international conference on Machine learning
Sparse Bayesian hierarchical modeling of high-dimensional clustering problems
Journal of Multivariate Analysis
Local spatial biclustering and prediction of urban juvenile delinquency and recidivism
Statistical Analysis and Data Mining
Hi-index | 0.00 |
This article introduces an agglomerative Bayesian model-based clustering algorithm which outputs a nested sequence of two-way cluster configurations for an input matrix of data. Each two-way cluster configuration in the output hierarchy is specified by a row configuration and a column configuration whose Cartesian product partitions the data matrix. Variable selection is incorporated into the algorithm by identifying row clusters which form distinct groups defined by the column clusters, through the use of a mixture model. A primitive similarity measure between the two clusters is the multiplicative change in model posterior probability implied by their merger, and the hierarchy is formed by iteratively merging the cluster pair which maximize some fixed monotonic function of this quantity. A naive implementation of the algorithm would be to choose this function to be the identity function. However, when applying this naive algorithm to gene expression data where the number of genes being studied typically far exceeds the number of experimental samples available, this imbalanced dimensionality of the data results in an algorithmic bias toward merging samples. To counteract this bias, alternative functions of the similarity measure are considered which prevent degenerative behavior of the algorithm. The resulting improvements in the output cluster configurations are demonstrated on simulated data and the method is then applied to real gene expression data. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.