A provenance-based approach to evaluate data quality in eScience

  • Authors:
  • Joana E. Gonzales Malaverri;André Santanchè;Claudia Bauzer Medeiros

  • Affiliations:
  • Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil;Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil;Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil

  • Venue:
  • International Journal of Metadata, Semantics and Ontologies
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data quality is growing in relevance as a research topic. Quality assessment has been progressively incorporated in many business environments, and in software engineering practices. eScience environments, however, because of the multiplicity and heterogeneity of data sources and scientific experts involved in a given problem, complicate data quality assessment. This paper deals with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformations applied to a given data product. Our contributions include a the specification of a framework to track data provenance and use it to derive quality information, b a model for data provenance based on the Open Provenance Model, and c a methodology to evaluate the quality of data based on its provenance. Our proposal is validated experimentally by a prototype that takes advantage of the Taverna workflow system.