Optimizing user views for workflows

Authors:
Olivier Biton;Susan B. Davidson;Sanjeev Khanna;Sudeepa Roy
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
Proceedings of the 12th International Conference on Database Theory
Year:
2009

Citing 12
Cited 6

Introduction to Algorithms

Introduction to Algorithms
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
The recognition of Series Parallel digraphs

STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
Lineage retrieval for scientific data processing: a survey

ACM Computing Surveys (CSUR)
A survey of data provenance in e-science

ACM SIGMOD Record
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
Report on the International Provenance and Annotation Workshop: (IPAW'06) 3-5 May 2006, Chicago

ACM SIGMOD Record
Zoom*UserViews: querying relevant provenance in workflow systems

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Provenance in collection-oriented scientific workflows

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Querying and Managing Provenance through User Views in Scientific Workflows

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Managing rapidly-evolving scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Detecting and resolving unsound workflow views for correct provenance analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy issues in scientific workflow provenance

Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science
Generating sound workflow views for correct provenance analysis

ACM Transactions on Database Systems (TODS)
On provenance and privacy

Proceedings of the 14th International Conference on Database Theory
Reconstructing unsound data provenance view in scientific workflow

APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A technique called user views has recently been proposed to focus user attention on relevant information in response to provenance queries over workflow executions [1, 2]: Given user input on what modules in the workflow specification are relevant to the user, a user view is a concise representation that clusters together modules to create a small number of composite modules (or clusters) such that (1) each composite module in a user view contains at most one relevant (atomic) module, thus assuming the "meaning" of that module; and (2) no control or data dependencies (either direct or indirect) are introduced (soundness) or removed (completeness) between relevant modules. The goal is to find a user view with a smallest number of composite modules. We show that for workflow specifications that are general graphs, regardless of the number of distinct modules in the input workflow and the structure of interaction between them, there always exists a user view of size at most (2k--1 -- k)2 + k, where k is the number of relevant modules. Moreover, a good user view with at most (2k--1 -- k)2 + k clusters can be computed in polynomial time in the size of the graph. We also show that this upper bound is tight. Thus in general graphs, the number of composite modules can be exponentially large in k even in an optimum user view for the specification. We also give a characterization of a good user view in terms of structural properties of each cluster in the user view. However, for series-parallel workflow graphs, we show that there is always a user-view with at most 2k -- 3 composite modules; further, there exist series-parallel graphs where every user view requires at least 2k -- 3 composite modules. Such graphs capture the structure of many scientific and other workflows that we have encountered in practice. For this class of graphs, we give a simple, linear time algorithm for constructing an optimum user view for a given specification.