The monte carlo database system: Stochastic analysis close to the data

Authors:
Ravi Jampani;Fei Xu;Mingxi Wu;Luis Perez;Chris Jermaine;Peter J. Haas
Affiliations:
University of Florida, Gainesville, FL;Microsoft Corporation, Redmond, WA;Oracle Corporation, Redwood Shores, CA;Rice University, Houston, TX;Rice University, Houston, TX;IBM Almaden Research Center, Armonk, NY
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2011

Citing 35
Cited 2

A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
The Management of Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
Modeling and generating multivariate time-series input processes using a vector autoregressive technique

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Monte Carlo Statistical Methods (Springer Texts in Statistics)

Monte Carlo Statistical Methods (Springer Texts in Statistics)
U-DBMS: a database system for managing constantly-evolving data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Approximate Data Collection in Sensor Networks using Probabilistic Models

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Improved long-period generators based on linear recurrences modulo 2

ACM Transactions on Mathematical Software (TOMS)
MauveDB: supporting model-based user views in database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The dichotomy of conjunctive queries on probabilistic structures

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)

Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Databases with uncertainty and lineage

The VLDB Journal — The International Journal on Very Large Data Bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Querying continuous functions in a database system

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Orion 2.0: native support for uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Proceedings of the VLDB Endowment
Data integration with uncertainty

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic databases: diamonds in the dirt

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Uncertainty management in rule-based information extraction systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
E = MC3: managing uncertain enterprise data in a cluster-computing environment

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Representing uncertain data: models, properties, and algorithms

The VLDB Journal — The International Journal on Very Large Data Bases
The trichotomy of HAVING queries on a probabilistic database

The VLDB Journal — The International Journal on Very Large Data Bases
Query evaluation over probabilistic XML

The VLDB Journal — The International Journal on Very Large Data Bases
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
MAD skills: new analysis practices for big data

Proceedings of the VLDB Endowment
Evaluation of probabilistic threshold queries in MCDB

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MCDB-R: risk analysis in the database

Proceedings of the VLDB Endowment

Towards high-throughput gibbs sampling at scale: a study across storage managers

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application of stochastic models and analysis techniques to large datasets is now commonplace. Unfortunately, in practice this usually means extracting data from a database system into an external tool (such as SAS, R, Arena, or Matlab), and then running the analysis there. This extract-and-model paradigm is typically error-prone, slow, does not support fine-grained modeling, and discourages what-if and sensitivity analyses. In this article we describe MCDB, a database system that permits a wide spectrum of stochastic models to be used in conjunction with the data stored in a large database, without ever extracting the data. MCDB facilitates in-database execution of tasks such as risk assessment, prediction, and imputation of missing data, as well as management of errors due to data integration, information extraction, and privacy-preserving data anonymization. MCDB allows a user to define “random” relations whose contents are determined by stochastic models. The models can then be queried using standard SQL. Monte Carlo techniques are used to analyze the probability distribution of the result of an SQL query over random relations. Novel “tuple-bundle” processing techniques can effectively control the Monte Carlo overhead, as shown in our experiments.