Random number generators: good ones are hard to find
Communications of the ACM
Communications of the ACM - Special issue on simulation
Parallel database systems: the future of high performance database systems
Communications of the ACM
A random number generator based on the combination of four LCGs
Mathematics and Computers in Simulation - Special issue: papers presented at the MSSA/IMACS 11th biennial conference on modelling and simulation
MYSTIQ: a system for finding more answers by using probabilities
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Improved long-period generators based on linear recurrences modulo 2
ACM Transactions on Mathematical Software (TOMS)
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Orion 2.0: native support for uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Modeling and querying probabilistic XML data
ACM SIGMOD Record
Efficient Jump Ahead for F2-Linear Random Number Generators
INFORMS Journal on Computing
Jigsaw: efficient optimization over uncertain enterprise data
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The monte carlo database system: Stochastic analysis close to the data
ACM Transactions on Database Systems (TODS)
Database foundations for scalable RDF processing
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Splash: a platform for analysis and simulation of health
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Efficient subject-oriented evaluating and mining methods for data with schema uncertainty
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Myriad: parallel data generation on shared-nothing architectures
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Hi-index | 0.00 |
Modern enterprises must manage uncertain data for purposes of risk assessment and decisionmaking under uncertainty. The Monte Carlo approach embodied in the MCDB system of Jampani et al. is well suited for such a task. MCDB can support industrial strength business-intelligence queries over uncertain warehouse data. Moreover, MCDB's extensible approach to specifying uncertainty can also capture complex stochastic prediction models, allowing sophisticated ``what-if'' analyses within the DBMS. The MCDB computations can be highly CPU intensive, but offer the potential for massive parallelization. To realize this potential, we provide a new system, called MC3 (Monte Carlo Computation on a Cluster), that extends the MCDB approach to the map-reduce processing framework. MC3 can exploit the robustness and scalability of map-reduce, and can handle data stored in non-relational formats. We show how MCDB query plans over ``tuple bundles'' can be translated to sequences of map-reduce operations over nested data, and describe different parallelization schemes. We also provide and analyze several novel distributed algorithms for adding pseudorandom number seeds to tuple bundles. These algorithms ensure statistical correctness of the Monte-Carlo computations while minimizing the seed length. Our experiments show that MC3 can scale well for a variety of workloads.