Incomplete Information in Relational Databases
Journal of the ACM (JACM)
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Database Management Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Journal of Machine Learning Research
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Query language support for incomplete information in the MayBMS system
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
Graphical Models, Exponential Families, and Variational Inference
Foundations and Trends® in Machine Learning
StatSnowball: a statistical approach to extracting entity relationships
Proceedings of the 18th international conference on World wide web
SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Representing uncertain data: models, properties, and algorithms
The VLDB Journal — The International Journal on Very Large Data Bases
PrDB: managing and exploiting rich correlations in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
Ricardo: integrating R and Hadoop
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Markov Logic: An Interface Layer for Artificial Intelligence
Markov Logic: An Interface Layer for Artificial Intelligence
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Scalable probabilistic databases with factor graphs and MCMC
Proceedings of the VLDB Endowment
Queries and materialized views on probabilistic databases
Journal of Computer and System Sciences
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
ACM Transactions on Intelligent Systems and Technology (TIST)
Web information extraction using Markov logic networks
Proceedings of the 20th international conference companion on World wide web
Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS
Proceedings of the VLDB Endowment
Hybrid in-database inference for declarative information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The monte carlo database system: Stochastic analysis close to the data
ACM Transactions on Database Systems (TODS)
Bayesian Reasoning and Machine Learning
Bayesian Reasoning and Machine Learning
Optimizing Statistical Information Extraction Programs over Evolving Text
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
The MADlib analytics library: or MAD skills, the SQL
Proceedings of the VLDB Endowment
GeoDeepDive: statistical inference using familiar data-processing languages
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
Factor graphs and Gibbs sampling are a popular combination for Bayesian statistical methods that are used to solve diverse problems including insurance risk models, pricing models, and information extraction. Given a fixed sampling method and a fixed amount of time, an implementation of a sampler that achieves a higher throughput of samples will achieve a higher quality than a lower-throughput sampler. We study how (and whether) traditional data processing choices about materialization, page layout, and buffer-replacement policy need to be changed to achieve high-throughput Gibbs sampling for factor graphs that are larger than main memory. We find that both new theoretical and new algorithmic techniques are required to understand the tradeoff space for each choice. On both real and synthetic data, we demonstrate that traditional baseline approaches may achieve two orders of magnitude lower throughput than an optimal approach. For a handful of popular tasks across several storage backends, including HBase and traditional unix files, we show that our simple prototype achieves competitive (and sometimes better) throughput compared to specialized state-of-the-art approaches on factor graphs that are larger than main memory.