Data quality through knowledge engineering

Authors:
Tamraparni Dasu;Gregg T. Vesonder;Jon R. Wright
Affiliations:
AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 9
Cited 2

Statistical analysis with missing data

Statistical analysis with missing data
Data quality: management and technology

Data quality: management and technology
Constraints and databases

Constraints and databases
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data quality: the field guide

Data quality: the field guide
Expert Systems: Principles and Programming

Expert Systems: Principles and Programming
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Exploratory Data Mining and Data Cleaning

Exploratory Data Mining and Data Cleaning

Using inheritance in a metadata based approach to data quality assessment

Proceedings of the first international workshop on Model driven service engineering and data quality and security
An extensible metadata framework for data quality assessment of composite structures

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, data quality programs have acted as a preprocessing stage to make data suitable for a data mining or analysis operation. Recently, data quality concepts have been applied to databases that support business operations such as provisioning and billing. Incorporating business rules that drive operations and their associated data processes is critically important to the success of such projects. However, there are many practical complications. For example, documentation on business rules is often meager. Rules change frequently. Domain knowledge is often fragmented across experts, and those experts do not always agree. Typically, rules have to be gathered from subject matter experts iteratively, and are discovered out of logical or procedural sequence, like a jigsaw puzzle. Our approach is to impement business rules as constraints on data in a classical expert system formalism sometimes called production rules. Our system works by allowing good data to pass through a system of constraints unchecked. Bad data violate constraints and are flagged, and then fed back after correction. Constraints are added incrementally as better understanding of the business rules is gained. We include a real-life case study.