Evaluating parameter sweep workflows in high performance computing

Authors:
Fernando Chirigati;Vítor Silva;Eduardo Ogasawara;Daniel de Oliveira;Jonas Dias;Fábio Porto;Patrick Valduriez;Marta Mattoso
Affiliations:
Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil;CEFET/RJ, Brazil;Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil;LNCC, Brazil;INRIA & LIRMM, France;Federal University of Rio de Janeiro, Brazil
Venue:
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Year:
2012

Citing 23
Cited 1

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Workflow management: models, methods, and systems

Workflow management: models, methods, and systems
Workflow Patterns

Distributed and Parallel Databases
Dynamic Load Balancing in Hierarchical Parallel Database Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Benchmarking and Configuration of Workflow Management Systems

CooplS '02 Proceedings of the 7th International Conference on Cooperative Information Systems
Pipeline and Batch Sharing in Grid Workloads

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Kepler: An Extensible System for Design and Execution of Scientific Workflows

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Dynamic scheduling of scientific workflow applications on the grid: a case study

Proceedings of the 2005 ACM symposium on Applied computing
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Challenges in executing large parameter sweep studies across widely distributed computing environments

Proceedings of the 5th IEEE workshop on Challenges of large applications in distributed environments
Nimrod/K: towards massively parallel dynamic grid workflows

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
MRBench: A Benchmark for MapReduce Framework

ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
A break in the clouds: towards a cloud definition

ACM SIGCOMM Computer Communication Review
Workflows and e-Science: An overview of workflow system features and capabilities

Future Generation Computer Systems
Wings for Pegasus: creating large-scale scientific applications using semantic representations of computational workflows

IAAI'07 Proceedings of the 19th national conference on Innovative applications of artificial intelligence - Volume 2
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
Exploring many task computing in scientific workflows

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Principles of Distributed Database Systems

Principles of Distributed Database Systems
Metrics for heterogeneous scientific workflows: A case study of an earthquake science application

International Journal of High Performance Computing Applications
Many task computing for orthologous genes identification in protozoan genomes using Hydra

Concurrency and Computation: Practice & Experience
Seven bottlenecks to workflow reuse and repurposing

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

On specifying and sharing scientific workflow optimization results using research objects

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific experiments based on computer simulations can be defined, executed and monitored using Scientific Workflow Management Systems (SWfMS). Several SWfMS are available, each with a different goal and a different engine. Due to the exploratory analysis, scientists need to run parameter sweep (PS) workflows, which are workflows that are invoked repeatedly using different input data. These workflows generate a large amount of tasks that are submitted to High Performance Computing (HPC) environments. Different execution models for a workflow may have significant differences in performance in HPC. However, selecting the best execution model for a given workflow is difficult due to the existence of many characteristics of the workflow that may affect the parallel execution. We developed a study to show performance impacts of using different execution models in running PS workflows in HPC. Our study contributes by presenting a characterization of PS workflow patterns (the basis for many existing scientific workflows) and its behavior under different execution models in HPC. We evaluated four execution models to run workflows in parallel. Our study measures the performance behavior of small, large and complex workflows among the evaluated execution models. The results can be used as a guideline to select the best model for a given scientific workflow execution in HPC. Our evaluation may also serve as a basis for workflow designers to analyze the expected behavior of an HPC workflow engine based on the characteristics of PS workflows.