Robust Bayesian Clustering for Replicated Gene Expression Data

Authors:
Jianyong Sun;Jonathan M. Garibaldi;Kim Kenobi
Affiliations:
The University of Nottingham, Sutton Bonington and The University of Nottingham, Nottingham;University of Nottingham, Nottingham;The University of Nottingham, Sutton Bonington
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 28
Cited 0

An Introduction to Variational Methods for Graphical Models

Machine Learning
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust mixture modelling using the t distribution

Statistics and Computing
Bayesian mixture model based clustering of replicated microarray data

Bioinformatics
Robust probabilistic projections

ICML '06 Proceedings of the 23rd international conference on Machine learning
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Maximum significance clustering of oligonucleotide microarrays

Bioinformatics
Clustering microarray gene expression data using weighted Chinese restaurant process

Bioinformatics
Evaluation and comparison of gene clustering methods in microarray analysis

Bioinformatics
Robust Bayesian clustering

Neural Networks
Robust mixtures in the presence of measurement errors

Proceedings of the 24th international conference on Machine learning
Inferring gene regulatory networks from temporal expression profiles under time-delay and noise

Computational Biology and Chemistry
An ensemble framework for clustering protein–protein interaction networks

Bioinformatics
Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data

Bioinformatics
An improved algorithm for clustering gene expression data

Bioinformatics
Graph-based consensus clustering for class discovery from gene expression data

Bioinformatics
Efficient algorithms for accurate hierarchical clustering of huge datasets

Bioinformatics
Efficient functional clustering of protein sequences using the Dirichlet process

Bioinformatics
Graphical Models, Exponential Families, and Variational Inference

Foundations and Trends® in Machine Learning
A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets

Bioinformatics
Seeing the forest for the trees

Bioinformatics
Bi-correlation clustering algorithm for determining a set of co-regulated genes

Bioinformatics
Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis

Bioinformatics
Inferential Clustering Approach for Microarray Experiments with Replicated Measurements

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification

Fuzzy Sets and Systems
Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data

Bioinformatics
Robust Bayesian mixture modelling

Neurocomputing
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.