Discovering significant and interpretable patterns from multifactorial DNA microarray data with poor replication

  • Authors:
  • Ju Han Kim;Dooil Jeoung;Seongeun Lee;Hyeouneui Kim

  • Affiliations:
  • Seoul National University Biomedical Informatics (SNUBI), Seoul 110-799, Republic of Korea and Human Genome Research Institute, Seoul National University College of Medicine, Seoul 110-799, Republ ...;Department of Microbiology, Kangwon National University, Chuncheon 200-701, Republic of Korea;In2Gen, 28 Yongon-dong Chongno-gu, Seoul 110-799, Republic of Korea;Graduate Program in Health Informatics, University of Minnesota, Minneapolis, MN

  • Venue:
  • Journal of Biomedical Informatics - Special issue: Biomedical machine learning
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Motivation. Multivariate analyses are advantageous for the simultaneous testing of the separate and combined effects of many variables and of their interactions. In factorial designs with many factors and/or levels, however, sufficient replication is often prohibitively costly. Furthermore, complicated statements are often required for the biological interpretation of the higher-order interactions determined by standard statistical techniques like analysis of variance.Results. Because we are usually interested in finding factor-specific effects or their interactions, we assumed that the observed expression profile of a gene is a manifestation of an underlying factor-specific generative pattern (FSGP) combined with noise. Thus. a genetic algorithm was created to find the nearest FSGP for each expression profile. We then measured the distance between each profile and the corresponding nearest FSGP. Permutation testing for the distance measures successfully identified those genes with statistically significant profiles, thus yielding straightforward biological interpretations. Association networks of genes, drugs, and cell lines were created as tripartite graphs, representing significant and interpretable relations, by using a microarray experiment of gastric-cancer cell lines with a factorial design and no replication. The proposed method may benefit the combined analysis of heterogeneous expression data from the growing public repositories.