Mind your metadata: exploiting semantics for configuration, adaptation, and provenance in scientific workflows

  • Authors:
  • Yolanda Gil;Pedro Szekely;Sandra Villamizar;Thomas C. Harmon;Varun Ratnakar;Shubham Gupta;Maria Muslea;Fabio Silva;Craig A. Knoblock

  • Affiliations:
  • Information Sciences Institute, University of Southern California, Marina del Rey, CA;Information Sciences Institute, University of Southern California, Marina del Rey, CA;School of Engineering, University of California Merced, Merced, CA;School of Engineering, University of California Merced, Merced, CA;Information Sciences Institute, University of Southern California, Marina del Rey, CA;Information Sciences Institute, University of Southern California, Marina del Rey, CA;Information Sciences Institute, University of Southern California, Marina del Rey, CA;Information Sciences Institute, University of Southern California, Marina del Rey, CA;Information Sciences Institute, University of Southern California, Marina del Rey, CA

  • Venue:
  • ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific metadata containing semantic descriptions of scientific data is expensive to capture and is typically not used across entire data analytic processes. We present an approach where semantic metadata is generated as scientific data is being prepared, and then subsequently used to configure models and to customize them to the data. The metadata captured includes sensor descriptions, data characteristics, data types, and process documentation. This metadata is then used in a workflow system to select analytic models dynamically and to set up model parameters automatically. In addition, all aspects of data processing are documented, and the system is able to generate extensive provenance records for new data products based on the metadata. As a result, the system can dynamically select analytic models based on the metadata properties of the data it is processing, generating more accurate results. We show results in analyzing stream metabolism for watershed ecosystem management.