Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Subgroup Discovery with CN2-SD
The Journal of Machine Learning Research
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Wrangler: interactive visual specification of data transformation scripts
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Intel "big data" science and technology center vision and execution plan
ACM SIGMOD Record
Hi-index | 0.00 |
As data analytics becomes mainstream, and the complexity of the underlying data and computation grows, it will be increasingly important to provide tools that help analysts understand the underlying reasons when they encounter errors in the result. While data provenance has been a large step in providing tools to help debug complex workflows, its current form has limited utility when debugging aggregation operators that compute a single output from a large collection of inputs. Traditional provenance will return the entire input collection, which has very low precision. In contrast, users are seeking precise descriptions of the inputs that caused the errors. We propose a Ranked Provenance System, which identifies subsets of inputs that influenced the output error, describes each subset with human readable predicates and orders them by contribution to the error. In this demonstration, we will present DBWipes, a novel data cleaning system that allows users to execute aggregate queries, and interactively detect, understand, and clean errors in the query results. Conference attendees will explore anomalies in campaign donations from the current US presidential election and in readings from a 54-node sensor deployment.