Context-aware replacement operations for data cleaning

  • Authors:
  • Stefan Brüggemann;Hans-Jürgen Appelrath

  • Affiliations:
  • OFFIS - Institute for Information Technology, Oldenburg, Germany;University of Oldenburg, Ammerländer Herrstr, Oldenburg, Germany

  • Venue:
  • Proceedings of the 2011 ACM Symposium on Applied Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data cleaning focuses on the identification and removal of consistency constraint violations. Existing approaches only perform statistical repair operations, i.e. inserting average or default values. This results in consistent data, but these data have no similarity with the given inconsistent data anymore. The use of an ontology-based approach allows for the detection of semantically related context-aware correction suggestions. We define metrics that can be used to calculate the similarity of such correction suggestions. We introduce measures to identify semantic distances of concepts in ontologies. This ontology enables the detection of context-aware correction suggestions and the calculation of their similarity to the invalid tuple. These suggestions can be presented to end users in data cleaning environments. We introduce this approach in a cancer registry that collects data about cancer cases. We show how the proposed approach can support domain experts in the registry in data cleaning.