Biologically valid linear factor models of gene expression

Authors:
Mark Girolami;Rainer Breitling
Affiliations:
Bioinformatics Research Centre, Department of Computing Science;Bioinformatics Research Centre, Department of Computing Science
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 6

Independent arrays or independent time courses for gene expression time series data analysis

Neurocomputing
Combinatorial genetic regulatory network analysis tools for high throughput transcriptomic data

RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
A Weighted Principal Component Analysis and Its Application to Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Tree-Dependent components of gene expression data for clustering

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
The linear factorial smoothing for the analysis of incomplete data

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Topographic independent component analysis of gene expression time series data

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: The identification of physiological processes underlying and generating the expression pattern observed in microarray experiments is a major challenge. Principal component analysis (PCA) is a linear multivariate statistical method that is regularly employed for that purpose as it provides a reduced-dimensional representation for subsequent study of possible biological processes responding to the particular experimental conditions. Making explicit the data assumptions underlying PCA highlights their lack of biological validity thus making biological interpretation of the principal components problematic. A microarray data representation which enables clear biological interpretation is a desirable analysis tool. Results: We address this issue by employing the probabilistic interpretation of PCA and proposing alternative linear factor models which are based on refined biological assumptions. A practical study on two well-understood microarray datasets highlights the weakness of PCA and the greater biological interpretability of the linear models we have developed. Availability: The model estimation routines are currently implemented as Matlab routines and these, as well as data and results reported, are available from the following URL: http://www.dcs.gla.ac.uk/~girolami/lfm/index.html