A demonstration of DBWipes: clean as you query

  • Authors:
  • Eugene Wu;Samuel Madden;Michael Stonebraker

  • Affiliations:
  • MIT CSAIL;MIT CSAIL;MIT CSAIL

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As data analytics becomes mainstream, and the complexity of the underlying data and computation grows, it will be increasingly important to provide tools that help analysts understand the underlying reasons when they encounter errors in the result. While data provenance has been a large step in providing tools to help debug complex workflows, its current form has limited utility when debugging aggregation operators that compute a single output from a large collection of inputs. Traditional provenance will return the entire input collection, which has very low precision. In contrast, users are seeking precise descriptions of the inputs that caused the errors. We propose a Ranked Provenance System, which identifies subsets of inputs that influenced the output error, describes each subset with human readable predicates and orders them by contribution to the error. In this demonstration, we will present DBWipes, a novel data cleaning system that allows users to execute aggregate queries, and interactively detect, understand, and clean errors in the query results. Conference attendees will explore anomalies in campaign donations from the current US presidential election and in readings from a 54-node sensor deployment.