A probabilistic relational algebra for the integration of information retrieval and database systems
ACM Transactions on Information Systems (TOIS)
The Management of Probabilistic Data
IEEE Transactions on Knowledge and Data Engineering
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Monte Carlo Statistical Methods (Springer Texts in Statistics)
U-DBMS: a database system for managing constantly-evolving data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Approximate Data Collection in Sensor Networks using Probabilistic Models
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Improved long-period generators based on linear recurrences modulo 2
ACM Transactions on Mathematical Software (TOMS)
MauveDB: supporting model-based user views in database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The dichotomy of conjunctive queries on probabilistic structures
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Databases with uncertainty and lineage
The VLDB Journal — The International Journal on Very Large Data Bases
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Querying continuous functions in a database system
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Orion 2.0: native support for uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do
SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Conditioning probabilistic databases
Proceedings of the VLDB Endowment
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
Data integration with uncertainty
The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic databases: diamonds in the dirt
Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Fast and Simple Relational Processing of Uncertain Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Uncertainty management in rule-based information extraction systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
E = MC3: managing uncertain enterprise data in a cluster-computing environment
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Representing uncertain data: models, properties, and algorithms
The VLDB Journal — The International Journal on Very Large Data Bases
The trichotomy of HAVING queries on a probabilistic database
The VLDB Journal — The International Journal on Very Large Data Bases
Query evaluation over probabilistic XML
The VLDB Journal — The International Journal on Very Large Data Bases
PrDB: managing and exploiting rich correlations in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
Evaluation of probabilistic threshold queries in MCDB
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MCDB-R: risk analysis in the database
Proceedings of the VLDB Endowment
Towards high-throughput gibbs sampling at scale: a study across storage managers
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
The application of stochastic models and analysis techniques to large datasets is now commonplace. Unfortunately, in practice this usually means extracting data from a database system into an external tool (such as SAS, R, Arena, or Matlab), and then running the analysis there. This extract-and-model paradigm is typically error-prone, slow, does not support fine-grained modeling, and discourages what-if and sensitivity analyses. In this article we describe MCDB, a database system that permits a wide spectrum of stochastic models to be used in conjunction with the data stored in a large database, without ever extracting the data. MCDB facilitates in-database execution of tasks such as risk assessment, prediction, and imputation of missing data, as well as management of errors due to data integration, information extraction, and privacy-preserving data anonymization. MCDB allows a user to define “random” relations whose contents are determined by stochastic models. The models can then be queried using standard SQL. Monte Carlo techniques are used to analyze the probability distribution of the result of an SQL query over random relations. Novel “tuple-bundle” processing techniques can effectively control the Monte Carlo overhead, as shown in our experiments.