(Re)Use in public scientific workflow repositories

  • Authors:
  • Johannes Starlinger;Sarah Cohen-Boulakia;Ulf Leser

  • Affiliations:
  • Department of Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany;Laboratoire de Recherche en Informatique, CNRS UMR 8623 and INRIA AMIB, Université Paris-Sud, France;Department of Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany

  • Venue:
  • SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific workflows help in designing, managing, monitoring, and executing in-silico experiments. Since scientific workflows often are complex, sharing them by means of public workflow repositories has become an important issue for the community. However, due to the increasing numbers of workflows available in such repositories, users have a crucial need for assistance in discovering the right workflow for a given task. To this end, identification of functional elements shared between workflows as a first step to derive meaningful similarity measures for workflows is a key point. In this paper, we present the results of a study we performed on the probably largest open workflow repository, myExperiment.org. Our contributions are threefold: (i) We discuss the critical problem of identifying same or similar (sub-)workflows and workflow elements, (ii) We study, for the first time, the problem of cross-author reuse and (iii) We provide a detailed analysis on the frequency of re-use of elements between workflows and authors, and identify characteristics of shared elements.