Parallel stochastic simulations of budding yeast cell cycle: load balancing strategies and theoretical analysis

  • Authors:
  • Tae-Hyuk Ahn;Adrian Sandu

  • Affiliations:
  • State University, Blacksburg VA;State University, Blacksburg VA

  • Venue:
  • Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The evolution of biochemical systems, where some chemical species are present with only a small numbers of molecules, is strongly influenced by discrete and stochastic effects. This evolution cannot be accurately captured by continuous and deterministic models, and special stochastic models are required. The budding yeast cell cycle provides an excellent example of the need for capturing stochastic effects in biochemical reactions. To obtain statistics of the cell evolution, a stochastic simulation algorithm must be run thousands of times with different initial conditions and parameter values. In order to manage the computational expense the large ensemble of runs needs to be executed in parallel. Each individual task is a stochastic simulation. The CPU time per task is unknown, and can vary considerably from one individual simulation to another. Because of this variability serious load imbalances appear and may considerably affect the efficiency of the parallel computation. This paper proposes two dynamic load balancing strategies for parallel runs of large ensembles of stochastic simulations of biological systems. A new probabilistic analysis framework is developed in order to quantify the performance of the load balancing algorithms when the CPU times per task are not known in advance. Simulation results with a stochastic budding yeast cell cycle model confirm the theoretical analysis. While this work is motivated by cell cycle modeling, the proposed analysis framework is general and can be directly applied to any ensemble simulation where many tasks are mapped onto each processor, and where the individual compute times vary considerably among tasks.