Many task computing for orthologous genes identification in protozoan genomes using Hydra

  • Authors:
  • Fábio Coutinho;Eduardo Ogasawara;Daniel de Oliveira;Vanessa Braganholo;Alexandre A. B. Lima;Alberto M. R. Dávila;Marta Mattoso

  • Affiliations:
  • COPPE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil and Federal Center of Technological Education, Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;Fluminense Federal University, Niterói, Brazil;COPPE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;Oswaldo Cruz Institute FIOCRUZ, Rio de Janeiro, Brazil and Computational and Systems Biology Pole FIOCRUZ Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the main advantages of using a scientific workflow management system (SWfMS) is to orchestrate data flows among scientific activities and register provenance of the whole workflow execution. Nevertheless, the execution control of distributed activities in high performance computing environments by SWfMS presents challenges such as steering control and provenance gathering. Such challenges may become a complex task to be accomplished in bioinformatics experiments, particularly in Many Task Computing scenarios. This paper presents a data parallelism solution for a bioinformatics experiment supported by Hydra, a middleware that bridges SWfMS and high performance computing to enable workflow parallelization with provenance gathering. Hydra Many Task Computing parallelization strategies can be registered and reused. Using Hydra, provenance may also be uniformly gathered. We have evaluated Hydra using an Orthologous Gene Identification workflow. Experimental results show that a systematic approach for distributing parallel activities is viable, sparing scientist time and diminishing operational errors, with the additional benefits of distributed provenance support. Copyright © 2011 John Wiley & Sons, Ltd. (The speed-up is based on comparisons with executions using one core in the cluster.)