Toward a Common Component Architecture for High-Performance Scientific Computing
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
An approach for pipelining nested collections in scientific workflows
ACM SIGMOD Record
Enabling ScientificWorkflow Reuse through Structured Composition of Dataflow and Control-Flow
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Workflows for e-Science: Scientific Workflows for Grids
Workflows for e-Science: Scientific Workflows for Grids
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Workflow automation for processing plasma fusion simulation data
Proceedings of the 2nd workshop on Workflows in support of large-scale science
On the relationship between workflow models and document types
Information Systems
VisComplete: Automating Suggestions for Visualization Pipelines
IEEE Transactions on Visualization and Computer Graphics
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Scientific workflow design for mere mortals
Future Generation Computer Systems
Future Generation Computer Systems
X-CSR: Dataflow Optimization for Distributed XML Process Pipelines
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
IAAI'07 Proceedings of the 19th national conference on Innovative applications of artificial intelligence - Volume 2
A Task Abstraction and Mapping Approach to the Shimming Problem in Scientific Workflows
SCC '09 Proceedings of the 2009 IEEE International Conference on Services Computing
Petri net + nested relational calculus = dataflow
OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Collection-Oriented scientific workflows for integrating and analyzing biological data
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
RAxML-OMP: an efficient program for phylogenetic inference on SMPs
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Actor-oriented design of scientific workflows
ER'05 Proceedings of the 24th international conference on Conceptual Modeling
Managing rapidly-evolving scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
RECYCLE: Learning looping workflows from annotated traces
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
Despite an increasing interest in scientific workflow technologies in recent years, workflow design remains a challenging, slow, and often error-prone process, thus limiting the speed of further adoption of scientific workflows. Based on practical experience with data-driven workflows, we identify and illustrate a number of recurring scientific workflow design challenges, i.e., parameter-rich functions; data assembly, disassembly, and cohesion; conditional execution; iteration; and, more generally, workflow evolution. In conventional approaches, such challenges usually lead to the introduction of different types of "shims", i.e., intermediary workflow steps that act as adapters between otherwise incorrectly wired components. However, relying heavily on the use of shims leads to brittle (i.e., change-intolerant) workflow designs that are hard to comprehend and maintain. To this end, we present a general workflow design paradigm called virtual data assembly lines (VDAL). In this paper, we show how the VDAL approach can overcome common scientific workflow design challenges and improve workflow designs by exploiting (i) a semistructured, nested data model like XML, (ii) a flexible, statically analyzable configuration mechanism (e.g., an XQuery fragment), and (iii) an underlying virtual assembly line model that is resilient to workflow and data changes. The approach has been implemented as Kepler/COMAD, and applied to improve the design of complex, real-world workflows.