Context-aware replacement operations for data cleaning

Authors:
Stefan Brüggemann;Hans-Jürgen Appelrath
Affiliations:
OFFIS - Institute for Information Technology, Oldenburg, Germany;University of Oldenburg, Ammerländer Herrstr, Oldenburg, Germany
Venue:
Proceedings of the 2011 ACM Symposium on Applied Computing
Year:
2011

Citing 9
Cited 0

A model of knowledge based information retrieval with hierarchical concept

Journal of Documentation
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Ontology-Based Data Cleaning

NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Improving data quality: consistency and accuracy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Semandaq: a data quality system based on conditional functional dependencies

Proceedings of the VLDB Endowment
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Interchangeable consistency constraints for public health care systems

Proceedings of the 2010 ACM Symposium on Applied Computing
Using ontologies for XML data cleaning

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data cleaning focuses on the identification and removal of consistency constraint violations. Existing approaches only perform statistical repair operations, i.e. inserting average or default values. This results in consistent data, but these data have no similarity with the given inconsistent data anymore. The use of an ontology-based approach allows for the detection of semantically related context-aware correction suggestions. We define metrics that can be used to calculate the similarity of such correction suggestions. We introduce measures to identify semantic distances of concepts in ontologies. This ontology enables the detection of context-aware correction suggestions and the calculation of their similarity to the invalid tuple. These suggestions can be presented to end users in data cleaning environments. We introduce this approach in a cancer registry that collects data about cancer cases. We show how the proposed approach can support domain experts in the registry in data cleaning.