Dependency discovery in data quality

  • Authors:
  • Daniele Barone;Fabio Stella;Carlo Batini

  • Affiliations:
  • Department of Computer Science, University of Toronto, Toronto, ON, Canada;Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy;Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy

  • Venue:
  • CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A conceptual framework for the automatic discovery of dependencies between data quality dimensions is described. Dependency discovery consists in recovering the dependency structure for a set of data quality dimensions measured on attributes of a database. This task is accomplished through the data mining methodology, by learning a Bayesian Network from a database. The Bayesian Network is used to analyze dependency between data quality dimensions associated with different attributes. The proposed framework is instantiated on a real world database. The task of dependency discovery is presented in the case when the following data quality dimensions are considered; accuracy, completeness, and consistency. The Bayesian Network model shows how data quality can be improved while satisfying budget constraints.