A datamining approach to cell population deconvolution from gene expressions using particle filters

  • Authors:
  • Sushmita Roy;Terran Lane;Margaret Werner-Washburne

  • Affiliations:
  • University of New Mexico, Albuquerque, NM;University of New Mexico, Albuquerque, NM;University of New Mexico, Albuquerque, NM

  • Venue:
  • Proceedings of the 5th international workshop on Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microarrays generally measure gene expressions from a mixture of cell subpopulations in different stages of a biological process. However, little or no information about these sub-populations is actually incorporated in existing data analyses. Estimation of these subpopulation proportions is important for measuring the extent of synchrony in the entire population. Based upon the gene expression specific to individual subpopulations, genes can be clustered and assigned functions. The relative abundance of the cellular subpopulations also reveals phenotypic information of mutant populations that is valuable for studies of genetic diseases such as cancer. Thus, the quantification of subpopulation proportions is important, not only as a reliability measure of microarray data but also because of its potential relevance to functional analysis and biomedical and clinical applications.In this paper, we describe a novel approach to model a biological process that provides (i) a maximum a posteriori (MAP) estimate of the subpopulations given the gene expression, (ii) stage-specific gene expression values and (iii) a gene clustering method based on their stage-specific expression. We have applied our approach to model the yeast cell-cycle and have extracted profiles of the population dynamics for different stages of the cell-cycle. Evaluation of statistical validity of our results using bootstrapped confidence tests reveals that our model captures significant temporal dynamics of the data. Our results are in agreement with existing biological knowledge and are reproducible in multiple runs of our algorithm.