Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks

  • Authors:
  • Jaroslaw Zola;Maneesha Aluru;Abhinav Sarje;Srinivas Aluru

  • Affiliations:
  • Iowa State University, Ames;Iowa State University, Ames;Iowa State University, Ames;Iowa State University, Ames and Indian Institute of Technology Bombay, India

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Constructing genome-wide gene regulatory networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, none of them is parallel, and they do not scale to the whole genome level or incorporate the largest data sets, particularly with rigorous statistical techniques. In this paper, we present a parallel method integrating mutual information, data processing inequality, and statistical testing to detect significant dependencies between genes, and efficiently exploit parallelism inherent in such computations. We present a new method to carry out permutation testing for assessing statistical significance of interactions, while reducing its computational complexity by a factor of \Theta (n^2), where n is the number of genes. Using both synthetic and known regulatory networks, we show that our method produces networks of quality similar to ARACNe, a widely used mutual-information-based method. We further explore the use of accelerators for gene network construction by presenting a parallelization on a cluster of IBM Cell blades. We exploit parallelization across multiple Cells, multiple cores within each Cell, and vector units within the cores to develop a high-performance implementation that effectively addresses the scaling problem. We report the first inference of a plant whole genome network by constructing a 15,222 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in 30 minutes on a 2,048-CPU IBM Blue Gene/L, and in 2 hours and 25 minutes on a 8-node Cell blade cluster.