Exploring repositories of scientific workflows

  • Authors:
  • Julia Stoyanovich;Ben Taskar;Susan Davidson

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA

  • Venue:
  • Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific workflows are gaining popularity, and repositories of workflows are starting to emerge. In this paper we present some initial experiences of information discovery in repositories of scientific workflows. In the first part of the paper we consider a collection of VisTrails workflows, and explore how this collection may be summarized when workflow modules are used as features. We present a hierarchical browsable view of the repository in which categories are derived using frequent itemset mining or latent Dirichlet allocation. We demonstrate that both approaches may be used for effective data exploration. In the second part of the paper we focus on a collection of Taverna workflows from myExperiment.org, and consider how these workflows may be browsed using modules and tags as features. Finally, we outline some interesting challenges and describe conditions under which these techniques work well for repositories of scientific workflows, and conditions under which additional work is needed for effective data exploration.