Dimensioning the virtual cluster for parallel scientific workflows in clouds

  • Authors:
  • Daniel de Oliveira;Vitor Viana;Eduardo Ogasawara;Kary Ocana;Marta Mattoso

  • Affiliations:
  • UFF - Fluminense Federal University, Niteroi, Brazil;COPPE/UFRJ, Rio de Janeiro, Brazil;CEFET/RJ, Rio de Janeiro, Brazil;COPPE/UFRJ, Rio de Janeiro, Brazil;COPPE/UFRJ, Rio de Janeiro, Brazil

  • Venue:
  • Proceedings of the 4th ACM workshop on Scientific cloud computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cloud computing has established itself as a solid computational model that allows for scientists to use a series of distributed virtual resources to execute a wide range of scientific experiments. In several cases, there is a demand for high performance in executing these experiments since many activities are data and computing intensive. Parallelism techniques are a key issue in this experimentation process. There are approaches that provide parallelism capabilities for scientific workflows in clouds. However, most of them rely on the scientist to dimension the virtual cluster to be instantiated. Dimensioning the virtual cluster to execute the workflow in parallel may be a hard task to accomplish, i.e. it is hard to define and adapt the optimal number of virtual machines to be used. Most systems follow this manual configuration of the scientist for the whole workflow execution, using adaptive techniques only in the presence of failures. Due to the huge number of options (virtual machine types) to configure a cloud environment, the configuration task commonly becomes impractical to be performed manually, and if it is not adjusted adaptively during the execution, it can impact negatively on workflow performance, or it can produce excessive increase in financial cost. This paper proposes a service called SciDim which is based on the use of a multi-objective cost function allied to genetic algorithms and provenance data to help determining an "ideal" initial configuration for the virtual cluster, under budget and deadline constraints set by the scientist