The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
UML Distilled: A Brief Guide to the Standard Object Modeling Language
UML Distilled: A Brief Guide to the Standard Object Modeling Language
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Journal of Parallel and Distributed Computing
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
ICWS '06 Proceedings of the IEEE International Conference on Web Services
Workflows for e-Science: Scientific Workflows for Grids
Workflows for e-Science: Scientific Workflows for Grids
Proceedings of the 5th IEEE workshop on Challenges of large applications in distributed environments
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
OrthoSearch: a scientific workflow approach to detect distant homologies on protozoans
Proceedings of the 2008 ACM symposium on Applied computing
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scientific Programming - Scientific Workflows
Provenance for Computational Tasks: A Survey
Computing in Science and Engineering
Nimrod/K: towards massively parallel dynamic grid workflows
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel query processing for OLAP in grids
Concurrency and Computation: Practice & Experience - Selection of Best Papers of the VLDB Data Management in Grids Workshop (VLDB DMG 2007)
The Open Provenance Model: An Overview
Provenance and Annotation of Data and Processes
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
On the Use of Cloud Computing for Scientific Workflows
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
High-Performance Query Processing of a Real-World OLAP Database with ParGRES
High Performance Computing for Computational Science - VECPAR 2008
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Making a cloud provenance-aware
TAPP'09 First workshop on on Theory and practice of provenance
Nested parallelism for multi-core HPC systems using Java
Journal of Parallel and Distributed Computing
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
International Journal of Computational Science and Engineering
A MapReduce-Enabled Scientific Workflow Composition Framework
ICWS '09 Proceedings of the 2009 IEEE International Conference on Web Services
Dynamic scheduling for heterogeneous Desktop Grids
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Introduction to Algorithms, Third Edition
Introduction to Algorithms, Third Edition
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Exploring many task computing in scientific workflows
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Nephele: efficient parallel data processing in the cloud
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Scheduling Multiple Parameter Sweep Workflow Instances on the Grid
E-SCIENCE '09 Proceedings of the 2009 Fifth IEEE International Conference on e-Science
An evolutionary game theoretic approach to adaptive and stable application deployment in clouds
Proceedings of the 2nd workshop on Bio-inspired algorithms for distributed systems
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Case study for running HPC applications in public clouds
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Data parallelism in bioinformatics workflows using Hydra
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Exploring the Performance Fluctuations of HPC Workloads on Clouds
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Principles of Distributed Database Systems
Principles of Distributed Database Systems
Schedule optimization for data processing flows on the cloud
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SciPhy: a cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes
BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology
A Performance Evaluation of X-Ray Crystallography Scientific Workflow Using SciCumulus
CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Scheduling Scientific Workflows Elastically for Cloud Computing
CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Towards a Cost Model for Scheduling Scientific Workflows Activities in Cloud Environments
SERVICES '11 Proceedings of the 2011 IEEE World Congress on Services
Parallelism in bioinformatics workflows
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
An adaptive parallel execution strategy for cloud-based scientific workflows
Concurrency and Computation: Practice & Experience
Capturing and querying workflow runtime provenance with PROV: a practical approach
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Dimensioning the virtual cluster for parallel scientific workflows in clouds
Proceedings of the 4th ACM workshop on Scientific cloud computing
Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows
Future Generation Computer Systems
User-steering of HPC workflows: state-of-the-art and future directions
Proceedings of the 2nd ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform
Journal of Grid Computing
Runtime Dynamic Structural Changes of Scientific Workflows in Clouds
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution.