Scripting distributed scientific workflows using Weaver

  • Authors:
  • Peter Bui;Li Yu;Andrew Thrasher;Rory Carmichael;Irena Lanc;Patrick Donnelly;Douglas Thain

  • Affiliations:
  • Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, INUSA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, INUSA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, INUSA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, INUSA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, INUSA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, INUSA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, INUSA

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Weaver is a high-level distributed computing framework that enables researchers to construct scalable scientific data-processing workflows. Instead of developing a new workflow language, we introduce a domain-specific language built on top of Python called Weaver, which takes advantage of users' familiarity with the programming language, minimizes barriers to adoption, and allows for integration with a rich ecosystem of existing software. In this paper, we provide an overview of Weaver's programming model, which allows users to organize and specify scientific workflows by using a collection of datasets, functions, and abstractions. We also explain how these workflow specifications are compiled into a directed acyclic graph that is used by the Makeflow workflow manager to dispatch work to a variety of distributed execution platforms. To demonstrate the power and benefits of using the framework in constructing scientific research applications, the paper examines four distinct real-world applications scripted using Weaver and analyzes the performance, scalability, and impact of the distributed generated scientific workflows. Copyright © 2011 John Wiley & Sons, Ltd.