Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance

  • Authors:
  • David Leake;Joseph Kendall-Morwick

  • Affiliations:
  • Computer Science Department, Indiana University, Bloomington, U.S.A. IN 47405;Computer Science Department, Indiana University, Bloomington, U.S.A. IN 47405

  • Venue:
  • ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

e-Science brings large-scale computation to bear on scientific problems, often by performing sequences of computational tasks organized into workflows and executed on distributed Web resources. Sophisticated AI tools have been developed to apply knowledge-rich methods to compose scientific workflows by generative planning, but the required knowledge can be difficult to acquire. Current work by the cyberinfrastructure community aims to routinely capture provenance during workflow execution, which would provide a new experience-based knowledge source for workflow generation: large-scale databases of workflow execution traces. This paper proposes exploiting these databases with a "knowledge light" approach to reuse, applying CBR methods to those traces to support scientists' workflow generation process. This paper introduces e-Science workflows as a CBR domain, sketches key technical issues, and illustrates directions towards addressing these issues through ongoing research on Phala, a system which supports workflow generation by aiding re-use of portions of prior workflows. The paper uses workflow data collected by the myGrid and myExperiment projects in experiments which suggest that Phala's methods have promise for assisting workflow composition in the context of scientific experimentation.