Time-bound analytic tasks on large datasets through dynamic configuration of workflows

  • Authors:
  • Yolanda Gil;Varun Ratnakar;Rishi Verma;Andrew Hart;Paul Ramirez;Chris Mattmann;Arni Sumarlidason;Samuel L. Park

  • Affiliations:
  • University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;NASA Jet Propulsion Laboratory, Pasadena, CA;NASA Jet Propulsion Laboratory, Pasadena, CA;NASA Jet Propulsion Laboratory, Pasadena, CA;NASA Jet Propulsion Laboratory, Pasadena, CA;MDA Information Systems LLC, Gaithersburg, MD;MDA Information Systems LLC, Gaithersburg, MD

  • Venue:
  • WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Domain experts are often untrained in big data technologies and this limits their ability to exploit the data they have available. Workflow systems hide the complexities of high-end computing and software engineering by offering pre-packaged analytic steps combined into multi-step methods commonly used by experts. A current limitation of workflow systems is that they do not take into account user deadlines: they run workflows selected by the user, but take their time to do so. This is impractical when large datasets are at stake, since users often prefer to see an answer faster even if it has lower precision or quality. In this paper, we present an extension to workflow systems that enables them to take into account user deadlines by automatically generating alternative workflow candidates and ranking them according to performance estimates. The system makes these estimates based on workflow performance models created from workflow executions, and uses semantic technologies to reason about workflow options. Possible workflow candidates are presented to the user in a compact manner, and are ranked according to their runtime estimates. We have implemented this approach in the WOOT system, which combines and extends capabilities from the WINGS semantic workflow system and the Apache OODT Object Oriented Data Technology and workflow execution system.