BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Authors:
Daisy Zhe Wang;Eirinaios Michelakis;Minos Garofalakis;Joseph M. Hellerstein
Affiliations:
Univ. of California, Berkeley EECS;Univ. of California, Berkeley EECS;Yahoo! Research and Univ. of California, Berkeley EECS;Univ. of California, Berkeley EECS
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 14
Cited 36

Incomplete Information in Relational Databases

Journal of the ACM (JACM)
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
The Management of Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
The Theory of Probabilistic Databases

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Learning Probabilistic Relational Models

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Probabilistic reasoning for complex systems

Probabilistic reasoning for complex systems
MauveDB: supporting model-based user views in database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
First-order probabilistic inference

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Uncertainty management in rule-based information extraction systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
E = MC3: managing uncertain enterprise data in a cluster-computing environment

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Indexing correlated probabilistic databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Representing uncertain data: models, properties, and algorithms

The VLDB Journal — The International Journal on Very Large Data Bases
Do you know your IQ?: a research agenda for information quality in systems

ACM SIGMETRICS Performance Evaluation Review
PODS: a new model and processing algorithms for uncertain data streams

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Consistent query answers in inconsistent probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A generic framework for handling uncertain data with local correlations

Proceedings of the VLDB Endowment
Set similarity join on probabilistic data

Proceedings of the VLDB Endowment
Scalable probabilistic databases with factor graphs and MCMC

Proceedings of the VLDB Endowment
A*-tree: a structure for storage and modeling of uncertain multidimensional arrays

Proceedings of the VLDB Endowment
k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment
Querying probabilistic information extraction

Proceedings of the VLDB Endowment
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Tractability in probabilistic databases

Proceedings of the 14th International Conference on Database Theory
Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS

Proceedings of the VLDB Endowment
Efficient query answering in probabilistic RDF graphs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hybrid in-database inference for declarative information extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Latent OLAP: data cubes over latent variables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Cleaning uncertain streams for query improvement

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
The monte carlo database system: Stochastic analysis close to the data

ACM Transactions on Database Systems (TODS)
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Cost-efficient repair in inconsistent probabilistic databases

Proceedings of the 20th ACM international conference on Information and knowledge management
Interactive reasoning in uncertain RDF knowledge bases

Proceedings of the 20th ACM international conference on Information and knowledge management
Probabilistic management of OCR data using an RDBMS

Proceedings of the VLDB Endowment
Towards a unified architecture for in-RDBMS analytics

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Local structure and determinism in probabilistic databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
CLARO: modeling and processing uncertain data streams

The VLDB Journal — The International Journal on Very Large Data Bases
Automatic knowledge base construction using probabilistic extraction, deductive reasoning, and human feedback

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
MADden: query-driven statistical text analytics

Proceedings of the 21st ACM international conference on Information and knowledge management
Towards high-throughput gibbs sampling at scale: a study across storage managers

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Beyond myopic inference in big data pipelines

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Causality and responsibility: probabilistic queries revisited in uncertain databases

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Tabular: a schema-driven probabilistic programming language

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Approximation trade-offs in a Markovian stream warehouse: An empirical study

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex spatio/temporal correlation patterns present in uncertain sensory data. Unfortunately, to date, most existing approaches to probabilistic database systems have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilistic information is typically associated with individual data tuples, with only limited or no support for effectively capturing and reasoning about complex data correlations. In this paper, we introduce BayesStore, a novel probabilistic data management architecture built on the principle of handling statistical models and probabilistic inference tools as first-class citizens of the database system. Adopting a machine-learning view, BAYESSTORE employs concise statistical relational models to effectively encode the correlation patterns between uncertain data, and promotes probabilistic inference and statistical model manipulation as part of the standard DBMS operator repertoire to support efficient and sound query processing. We present BAYESSTORE's uncertainty model based on a novel, first-order statistical model, and we redefine traditional query processing operators, to manipulate the data and the probabilistic models of the database in an efficient manner. Finally, we validate our approach, by demonstrating the value of exploiting data correlations during query processing, and by evaluating a number of optimizations which significantly accelerate query processing.