An Interactive Framework for Data Cleaning

  • Authors:
  • Vijayshankar Raman;Joseph M. Hellerstein

  • Affiliations:
  • -;-

  • Venue:
  • An Interactive Framework for Data Cleaning
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cleaning organizational data of discrepancies in structure and content is important for data warehousing and Enterprise Data Integration (EDI). Current commercial solutions for data cleaning involve many iterations of time-consuming "data quality" analysis to find errors, and long-running transformations to fix them. Users need to endure long waits and often write complex transformation programs. We present an interactive framework for data cleaning that tightly integrates transformation and discrepancy detection. Users gradually build transformations by adding or undoing transforms, in a intu-itive, graphical manner through a spreadsheet-like interface; the effect of a transformis shown at once on records visible on screen. In the background, the system incrementally searches for discrepancies on the latest transformed version of data, flagging them as they are found. This allows users to gradually construct a transformation as discrepancies are found, and clean the data without writing complex programs or enduring long delays. Balancing the goals of power, ease of specification, and interactive application, we choose a set of transforms that can be used for transformations within data records as well as for higher-order transformations. We also present initial work on optimizing a sequence of transforms.