A notation and system for expressing and executing cleanly typed workflows on messy scientific data

  • Authors:
  • Yong Zhao;Jed Dobson;Ian Foster;Luc Moreau;Michael Wilde

  • Affiliations:
  • University of Chicago, Chicago, IL;Dartmouth College, Hanover, NH;University of Chicago, Chicago, IL;University of Southampton, Southampton, U.K.;Argonne National Laboratory, Argonne, IL

  • Venue:
  • ACM SIGMOD Record
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed, compositional workflow notation within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful runtime system that supports distributed execution. The resulting notation and system are capable both of expressing complex workflows in a simple, compact form, and of enacting those workflows in distributed environments. We apply our technique to cognitive neuroscience workflows that analyze functional MRI image data, and demonstrate significant reductions in code size relative to other approaches.