Modeling and optimization of scientific workflows

  • Authors:
  • Daniel Zinn

  • Affiliations:
  • University of California at Davis, Davis, CA

  • Venue:
  • Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Simulation and computer-aided data analysis have become an integral part of many traditional sciences and have spawned virtual observatories and even entirely new disciplines, e.g. bioinformatics. Scientific workflow systems are built for modeling and automation of scientific applications, to increase scientists' productivity. In this paper, we present desiderata, which we believe scientific workflow systems should have from a scientist's point-of-view. In particular, they should support data modeling, be resilient against input data changes, should check workflow well-formedness, as well as automatically optimize workflow specifications for efficient execution. We argue that current approaches do not adequately address these desiderata, in particular, conventional workflows need to be changed radically to cope with common changes in the input data structure. Workflows built using a Collection-Oriented Modeling and Design (Comad) approach, on the other side, exhibit much greater resilience to input changes. We propose to further extend and formalize Comad by creating a separate configuration layer to gap between scientific functionality (e.g., scripts, programs, or web-services) and the high-level workflow graph. The design of this gap language and an appropriate type system is part of the proposed Ph.D. project. As an initial result we show how to adopt XML regular expression types on the workflow channels and how to characterize actor behavior by defining actor signatures. This allows us to propagate schema information through the workflow, to predict workflow output schema (well-formedness), as well as to automatically optimize data routing for less overall shippings of data as well as for an increase in workflow concurrency.