The Latent Process Decomposition of cDNA Microarray Data Sets

Authors:
Simon Rogers;Mark Girolami;Colin Campbell;Rainer Breitling
Affiliations:
-;-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2005

Citing 1
Cited 11

Latent dirichlet allocation

The Journal of Machine Learning Research

Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Expression microarray classification using topic models

Proceedings of the 2010 ACM Symposium on Applied Computing
Biologically-aware latent dirichlet allocation (BaLDA) for the classification of expression microarray

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Biclustering of expression microarray data using affinity propagation

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
A comparison on score spaces for expression microarray data classification

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
On class visualisation for high dimensional data: exploring scientific data sets

DS'06 Proceedings of the 9th international conference on Discovery Science
Combining information theoretic kernels with generative embeddings for classification

Neurocomputing
Investigating Topic Models' Capabilities in Expression Microarray Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Feature selection using counting grids: application to microarray data

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Classification of Alzheimer Diagnosis from ADNI Plasma Biomarker Data

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Exploiting geometry in counting grids

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called Latent Process Decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in constrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.