Synchronous Random Fields and Image Restoration
IEEE Transactions on Pattern Analysis and Machine Intelligence
Distributed data clustering can be efficient and exact
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
The Journal of Machine Learning Research
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Handbook of Parallel Computing and Statistics (Statistics, Textbooks and Monographs)
Handbook of Parallel Computing and Statistics (Statistics, Textbooks and Monographs)
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
Organizing the OCA: learning faceted subjects from a library of digital books
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Fully distributed EM for very large datasets
Proceedings of the 25th international conference on Machine learning
A hybrid unsupervised image re-ranking approach with latent topic contents
Proceedings of the ACM International Conference on Image and Video Retrieval
Variational inference for adaptor grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A latent dirichlet allocation method for selectional preferences
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
ACM Transactions on Intelligent Systems and Technology (TIST)
Topic chains for understanding a news corpus
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An unsupervised model for joint phrase alignment and extraction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Conditional topical coding: an efficient topic model conditioned on rich features
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A time-dependent topic model for multiple text streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
Larger residuals, less work: active document scheduling for latent dirichlet allocation
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling
ACM Transactions on Intelligent Systems and Technology (TIST)
Scalable inference in latent variable models
Proceedings of the fifth ACM international conference on Web search and data mining
Collective context-aware topic models for entity disambiguation
Proceedings of the 21st international conference on World Wide Web
Large scale microblog mining using distributed MB-LDA
Proceedings of the 21st international conference companion on World Wide Web
Large-scale distributed non-negative sparse coding and sparse dictionary learning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Large scale decipherment for out-of-domain machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
G-WSTD: a framework for geographic web search topic discovery
Proceedings of the 21st ACM international conference on Information and knowledge management
Finding nuggets in IP portfolios: core patent mining through textual temporal analysis
Proceedings of the 21st ACM international conference on Information and knowledge management
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Scalable inference in max-margin topic models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed large-scale natural graph factorization
Proceedings of the 22nd international conference on World Wide Web
Stochastic variational inference
The Journal of Machine Learning Research
Detecting non-gaussian geographical topics in tagged photo collections
Proceedings of the 7th ACM international conference on Web search and data mining
Proceedings of the 7th ACM international conference on Web search and data mining
Scalable topic-specific influence analysis on microblogs
Proceedings of the 7th ACM international conference on Web search and data mining
Proceedings of the 7th ACM international conference on Web search and data mining
Fast topic discovery from web search streams
Proceedings of the 23rd international conference on World wide web
Discovery of clinical pathway patterns from event logs using probabilistic topic models
Journal of Biomedical Informatics
Hi-index | 0.00 |
We describe distributed algorithms for two widely-used topic models, namely the Latent Dirichlet Allocation (LDA) model, and the Hierarchical Dirichet Process (HDP) model. In our distributed algorithms the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. We propose two distributed algorithms for LDA. The first algorithm is a straightforward mapping of LDA to a distributed processor setting. In this algorithm processors concurrently perform Gibbs sampling over local data followed by a global update of topic counts. The algorithm is simple to implement and can be viewed as an approximation to Gibbs-sampled LDA. The second version is a model that uses a hierarchical Bayesian extension of LDA to directly account for distributed data. This model has a theoretical guarantee of convergence but is more complex to implement than the first algorithm. Our distributed algorithm for HDP takes the straightforward mapping approach, and merges newly-created topics either by matching or by topic-id. Using five real-world text corpora we show that distributed learning works well in practice. For both LDA and HDP, we show that the converged test-data log probability for distributed learning is indistinguishable from that obtained with single-processor learning. Our extensive experimental results include learning topic models for two multi-million document collections using a 1024-processor parallel computer.