Optimizing bioinformatics workflows for data analysis using cloud management techniques

  • Authors:
  • Vincent C. Emeakaroha;Paweł P. Łabaj;Michael Maurer;Ivona Brandic;David P. Kreil

  • Affiliations:
  • Vienna University of Technology, Vienna, Austria;Boku University Vienna, Vienna, Austria;Vienna University of Technology, Vienna, Austria;Vienna University of Technology, Vienna, Austria;Boku University Vienna, Vienna, Austria

  • Venue:
  • Proceedings of the 6th workshop on Workflows in support of large-scale science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid development in recent years of high-throughput technologies in the life sciences, huge amounts of data are being generated and stored in databases. Despite significant advances in computing capacity and performance, an analysis of these large-scale data in a search for biomedically relevant patterns remains a challenging task. Scientific workflow applications support data-mining in more complex scenarios that include many data sources and computational tools, as commonly found in bioinformatics. A scientific workflow application is a holistic unit that defines, executes, and manages scientific applications using different software tools. Existing workflow applications are process- or data- rather than resource-oriented. Thus, they lack efficient computational resource management capabilities, such as those provided by Cloud computing environments. Insufficient computational resources disrupt the execution of workflow applications, wasting time and money. To address this issue, advanced resource monitoring and management strategies are required to determine the resource consumption behaviours of workflow applications for a dynamical allocation and deallocation of resources. In this paper, we present a novel Cloud resource monitoring technique and a knowledge management strategy to manage computational resources for workflow applications in order to guarantee their performance goals and their successful completion. We present the design description of these techniques, demonstrate how they can be applied to scientific workflow applications, and present first evaluation results as a proof of concept.