Monitoring data quality in Kepler

  • Authors:
  • Aisa Na'im;Daniel Crawl;Maria Indrawan;Ilkay Altintas;Shulei Sun

  • Affiliations:
  • Monash University, Caulfield East, Australia;San Diego Supercomputer Centre, San Diego, CA;Monash University, Caulfield East, Australia;San Diego Supercomputer Centre, San Diego, CA;University of California San Diego

  • Venue:
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data quality is an important component of modern scientific discovery. Many scientific discovery processes consume data from a diverse array of resources such as streaming sensor networks, web services, and databases. The validity of a scientific computation's results is highly dependent on the quality of these input data. Scientific workflow systems are being increasingly used to automate scientific computations by facilitating experiment design, data capture, integration, processing, and analysis. These workflows may execute in grid or cloud environments, and if the data produced during workflow execution is deemed unusable or low in quality, execution should stop to prevent wasting these valuable resources. We propose an approach in the Kepler scientific workflow system for monitoring data quality and demonstrate its use for oceanography and bioinformatics domains.