On Determination of Minimum Sample Size for Discovery of Temporal Gene Expression Patterns

  • Authors:
  • Fang-Xiang Wu;W. J. Zhang;Anthony J. Kusalik

  • Affiliations:
  • University of Saskatchewan, Canada;University of Saskatchewan, USA;University of Saskatchewan, USA

  • Venue:
  • IMSCCS '06 Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences - Volume 1 (IMSCCS'06) - Volume 01
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

DNA microarray technologies allow for the simultaneous monitoring of thousands of genes, which reveal important information about cellular and tissue expression phenotypes. From a viewpoint of data analysis, microarray experiments may be classified into (1) classification of patients or non-patients or more subtypes in terms of gene expressions, (2) discovery of gene expression patterns over a set of different conditions, and (3) discovery of gene expression patterns for one same tissue over a series of time points while the underlying biological process evolves. This article concerns class (3) of problems. An important feature with this class of problems is dependency among gene expression data corresponding to time points. One of the important issues here is the specification of time points, including (1) the number of time points, and (2) the span between time points. In the absence of knowledge from the biologist about this specification, one naturally turns to the quest of whether the behaviour of resulting data progressively generated may help by itself determine a "cut off" line, beyond which further micorarray experiments do not contribute to the pattern discovery. Additionally, such a cut-off line implies the minimum sample size, which is important because these experiments are rather costly in terms of time and reagents required. We have developed a method for the determination of the minimum sample size (or the minimum number of time points) for temporal gene expression, assuming that the span between time points is given and the hierarchical clustering technique is used for gene expression pattern discovery. Our basic idea was to develop a similarity measure for two clusterings corresponding to two time points, which was further expressed as a function of time points progressively. While the experiment is going on, this function is evaluated to see whether it reaches a "saturated" state where further experiments do not contribute to the discrimination of patterns any more. The method has been verified with two previously published gene expression datasets; specifically for both experiments, the number of time points determined with our method is less than that in these experiments. Although at present our method employed the hierarchical clustering technique, the overall idea of the method is applicable to other clustering techniques