Efficiently updating materialized views
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
A probabilistic relational algebra for the integration of information retrieval and database systems
ACM Transactions on Information Systems (TOIS)
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
MYSTIQ: a system for finding more answers by using probabilities
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Machine Learning
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases
Proceedings of the VLDB Endowment
Sampling-based estimators for subset-based queries
The VLDB Journal — The International Journal on Very Large Data Bases
Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Joint inference in information extraction
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Database researchers: plumbers or thinkers?
Proceedings of the 14th International Conference on Extending Database Technology
Hybrid in-database inference for declarative information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Database foundations for scalable RDF processing
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Interactive reasoning in uncertain RDF knowledge bases
Proceedings of the 20th ACM international conference on Information and knowledge management
Towards a unified architecture for in-RDBMS analytics
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Local structure and determinism in probabilistic databases
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Probabilistic databases with MarkoViews
Proceedings of the VLDB Endowment
Monte Carlo MCMC: efficient inference by approximate sampling
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Towards high-throughput gibbs sampling at scale: a study across storage managers
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Adaptive exploration for large-scale protein analysis in the molecular dynamics database
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
Incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or treatment of relational algebra operators. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary queries over probabilistic databases with arbitrary dependencies expressed by graphical models with structure that changes during inference. MCMC sampling provides efficiency by hypothesizing modifications to possible worlds rather than generating entire worlds from scratch. Queries are then run over the portions of the world that change, avoiding the onerous cost of running full queries over each sampled world. A significant innovation of this work is the connection between MCMC sampling and materialized view maintenance techniques: we find empirically that using view maintenance techniques is several orders of magnitude faster than naively querying each sampled world. We also demonstrate our system's ability to answer relational queries with aggregation, and demonstrate additional scalability through the use of parallelization on a real-world complex model of information extraction. This framework is sufficiently expressive to support probabilistic inference not only for answering queries, but also for inferring missing database content from raw evidence.