Monitoring data quality in Kepler

Authors:
Aisa Na'im;Daniel Crawl;Maria Indrawan;Ilkay Altintas;Shulei Sun
Affiliations:
Monash University, Caulfield East, Australia;San Diego Supercomputer Centre, San Diego, CA;Monash University, Caulfield East, Australia;San Diego Supercomputer Centre, San Diego, CA;University of California San Diego
Venue:
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Year:
2010

Citing 5
Cited 0

Towards a Quality Model for Effective Data Selection in Collaboratories

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Managing information quality in e-science: the qurator workbench

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Lifecycle of Scientific Workflows and their Provenance: A Usage Perspective

SERVICES '08 Proceedings of the 2008 IEEE Congress on Services - Part I
Reasoning on Scientific Workflows

SERVICES '09 Proceedings of the 2009 Congress on Services - I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data quality is an important component of modern scientific discovery. Many scientific discovery processes consume data from a diverse array of resources such as streaming sensor networks, web services, and databases. The validity of a scientific computation's results is highly dependent on the quality of these input data. Scientific workflow systems are being increasingly used to automate scientific computations by facilitating experiment design, data capture, integration, processing, and analysis. These workflows may execute in grid or cloud environments, and if the data produced during workflow execution is deemed unusable or low in quality, execution should stop to prevent wasting these valuable resources. We propose an approach in the Kepler scientific workflow system for monitoring data quality and demonstrate its use for oceanography and bioinformatics domains.