Integrating databases and workflow systems

  • Authors:
  • Srinath Shankar;Ameet Kini;David J. DeWitt;Jeffrey Naughton

  • Affiliations:
  • University of Wisconsin, Madison, WI;University of Wisconsin, Madison, WI;University of Wisconsin, Madison, WI;University of Wisconsin, Madison, WI

  • Venue:
  • ACM SIGMOD Record
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been an information explosion in fields of science such as high energy physics, astronomy, environmental sciences and biology. There is a critical need for automated systems to manage scientific applications and data. Database technology is well-suited to handle several aspects of workflow management. Contemporary workflow systems are built from multiple, separately developed components and do not exploit the full power of DBMSs in handling data of large magnitudes. We advocate a holistic view of a WFMS that includes not only workflow modeling but planning, scheduling, data management and cluster management. Thus, it is worthwhile to explore the ways in which databases can be augmented to manage workflows in addition to data. We present a language for modeling workflows that is tightly integrated with SQL. Each scientific program in a workflow is associated with an active table or view. The definition of data products is in relational format, and invocation of programs and querying is done in SQL. The tight coupling between workflow management and data-manipulation is an advantage for data-intensive scientific programs.