A new framework for identifying differentially expressed genes

  • Authors:
  • Jie Li;Xianglong Tang;Wei Zhao;Jianhua Huang

  • Affiliations:
  • Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Microarrays have been widely used to classify cancer samples and discover the biological types, for example tumor versus normal phenotypes in cancer research. One of the challenging scientific tasks in the post-genomic epoch is how to identify a subset of differentially expressed genes from thousands of genes in microarray data which will enable us to understand the underlying molecular mechanisms of diseases, accurately diagnosing diseases and identifying novel therapeutic targets. In this paper, we propose a new framework for identifying differentially expressed genes. In the proposed framework, genes are ranked according to their residuals. The performance of the framework is assessed through applying it to several public microarray data. Experimental results show that the proposed method gives more robust and accurate rank than other statistical test methods, such as t-test, Wilcoxon rank sum test and KS-test. Another novelty of the method is that we design an algorithm for selecting a small subset of genes that show significant variation in expression (''outlier'' genes). The number of genes in the small subset can be controlled via an alterable window of confidence level. In addition, the results of the proposed method can be visualized. By observing the residual plot, we can easily find genes that show significant variation in two groups of samples and learn the degrees of differential expression of genes. Through a comparison study, we found several ''outlier'' genes which had been verified in previous biological experiments while they were either not identified by other methods or had lower ranks in standard statistical tests.