Jigsaw: efficient optimization over uncertain enterprise data

Authors:
Oliver A. Kennedy;Suman Nath
Affiliations:
EPFL, Lausanne, Switzerland;Microsoft Research, Seattle, WA, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 12
Cited 2

Undecidability of static analysis

ACM Letters on Programming Languages and Systems (LOPLAS)
MYSTIQ: a system for finding more answers by using probabilities

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
MauveDB: supporting model-based user views in database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Discrete Cosine Transfom

IEEE Transactions on Computers
A general framework for modeling and processing optimization queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Querying continuous functions in a database system

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Orion 2.0: native support for uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Simultaneous Equation Systems for Query Processing on Continuous-Time Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
E = MC3: managing uncertain enterprise data in a cluster-computing environment

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A demonstration of SciDB: a science-oriented DBMS

Proceedings of the VLDB Endowment
MCDB-R: risk analysis in the database

Proceedings of the VLDB Endowment

Fuzzy prophet: parameter exploration in uncertain enterprise scenarios

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Simulation of database-valued markov chains using SimSQL

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic databases, in particular ones that allow users to externally define models or probability distributions -- so called VG-Functions -- are an ideal tool for constructing, simulating and analyzing hypothetical business scenarios. Enterprises often use such tools with parameterized models and need to explore a large parameter space in order to discover parameter values that optimize for a given goal. Parameter space is usually very large, making such exploration extremely expensive. We present Jigsaw, a probabilistic database-based simulation framework that addresses this performance problem. In Jigsaw, users define what-if style scenarios as parameterized probabilistic database queries and identify parameter values that achieve desired properties. Jigsaw uses a novel "fingerprinting" technique that efficiently identifies correlations between a query's output distribution for different parameter values. Using fingerprints, Jigsaw is able to reuse work performed for different parameter values, and obtain speedups of as much as 2 orders of magnitude for several real business scenarios.