A semantic framework for automatic generation of computational workflows using distributed data and component catalogues

Authors:
Yolanda Gil;Pedro A. Gonzalez-Calero;Jihie Kim;Joshua Moody;Varun Ratnakar
Affiliations:
Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA;Facultad de Informatica, Universidad Complutense de Madrid, 28040 Madrid, Spain;Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA;Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA;Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA
Venue:
Journal of Experimental & Theoretical Artificial Intelligence
Year:
2011

Citing 0
Cited 10

TellMe: learning procedures from tutorial instruction

Proceedings of the 16th international conference on Intelligent user interfaces
Workflows for information integration in the life sciences

Search computing
LinkedDataLens: linked data as a network of networks

Proceedings of the sixth international conference on Knowledge capture
Mind your metadata: exploiting semantics for configuration, adaptation, and provenance in scientific workflows

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Composer-Science: A semantic service based framework for workflow composition in e-Science projects

Information Sciences: an International Journal
A new approach for publishing workflows: abstractions, standards, and linked data

Proceedings of the 6th workshop on Workflows in support of large-scale science
CLI-mate: an interface generator for command line programs

Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences
Time-bound analytic tasks on large datasets through dynamic configuration of workflows

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Structured analysis of the ISI Atomic Pair Actions dataset using workflows

Pattern Recognition Letters
Computer-Assisted Scientific Workflow Design

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computational workflows are a powerful paradigm to represent and manage complex applications, particularly in large-scale distributed scientific data analysis. Workflows represent application components that result in individual computations as well as their interdependences in terms of dataflow. Workflow systems use these representations to manage various aspects of workflow creation and execution for users, such as the automatic assignment of execution resources. This article describes an approach to automating a new aspect of the process: the selection of application components and data sources. We present a novel approach that enables users to specify varying degrees of detail and amount of constraints in a workflow request, including the specification of constraints on input, intermediate or output data in the workflow, abstract workflow component classes rather than specific component implementations, and generic reusable workflow templates that express a pre-defined combination of components. The algorithm elaborates the user request into a set of fully ground workflows with specific choices of data sources and codes to be used so that they can be submitted for mapping and execution. The algorithm searches through the space of possible candidate workflows by creating increasingly more specialized versions of the original template and eliminating candidates that violate constraints cumulated in the candidate workflow as components and data sources are selected. A novel feature of our approach is that it assumes a distributed architecture where data and component catalogues are separate from the workflow system. The algorithm explicitly poses queries to external catalogues, and therefore any reasoning regarding data or component properties is not assumed to occur within the workflow system. We describe our implementation of this approach in the Wings workflow system. This implementation uses the W3C Web Ontology Language and associated reasoners to implement the workflow system as well as the data and component catalogues. This research demonstrates the use of artificial intelligence techniques to support the kinds of automation envisioned by the scientific community for large-scale distributed scientific data analysis.