Simulation of database-valued markov chains using SimSQL

  • Authors:
  • Zhuhua Cai;Zografoula Vagena;Luis Perez;Subramanian Arumugam;Peter J. Haas;Christopher Jermaine

  • Affiliations:
  • Rice University, Houston, TX, USA;LogicBlox, Inc., Atlanta, GA, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA;IBM Almaden, San Jose, CA, USA;Rice University, Houston, TX, USA

  • Venue:
  • Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the SimSQL system, which allows for SQLbased specification, simulation, and querying of database-valued Markov chains, i.e., chains whose value at any time step comprises the contents of an entire database. SimSQL extends the earlier Monte Carlo database system (MCDB), which permitted Monte Carlo simulation of static database-valued random variables. Like MCDB, SimSQL uses user-specified "VG functions" to generate the simulated data values that are the building blocks of a simulated database. The enhanced functionality of SimSQL is enabled by the ability to parametrize VG functions using stochastic tables, so that one stochastic database can be used to parametrize the generation of another stochastic database, which can parametrize another, and so on. Other key extensions include the ability to explicitly define recursive versions of a stochastic table and the ability to execute the simulation in a MapReduce environment. We focus on applying SimSQL to Bayesian machine learning.