PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

Authors:
Zhiyuan Liu;Yuzhou Zhang;Edward Y. Chang;Maosong Sun
Affiliations:
Google Inc., China;Google Inc., China;Google Inc., China;Tsinghua University
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST)
Year:
2011

Citing 11
Cited 18

Latent dirichlet allocation

The Journal of Machine Learning Research
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Organizing the OCA: learning faceted subjects from a library of digital books

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Memory bounded inference in topic models

Proceedings of the 25th international conference on Machine learning
Fast collapsed gibbs sampling for latent dirichlet allocation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On weighted balls-into-bins games

Theoretical Computer Science
Collaborative filtering for orkut communities: discovery of user latent behavior

Proceedings of the 18th international conference on World wide web
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Learning author-topic models from text corpora

ACM Transactions on Information Systems (TOIS)
Distributed Algorithms for Topic Models

The Journal of Machine Learning Research

Processing web-scale multimedia data

Proceedings of the international conference on Multimedia
A deep-learning model-based and data-driven hybrid architecture for image annotation

Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
Confucius and its intelligent disciples: integrating social with search

Proceedings of the VLDB Endowment
Entity set expansion using topic information

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
MPI/OpenMP hybrid parallel inference for Latent Dirichlet Allocation

Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
A conversation with Dr. Edward Y. Chang

ACM SIGKDD Explorations Newsletter
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling

ACM Transactions on Information Systems (TOIS)
When a city tells a story: urban topic analysis

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Exploring generative models of tripartite graphs for recommendation in social media

Proceedings of the 4th International Workshop on Modeling Social Media
Towards high-throughput gibbs sampling at scale: a study across storage managers

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A graph-based topic extraction method enabling simple interactive customization

Proceedings of the 2013 ACM symposium on Document engineering
Manipulation among the arbiters of collective intelligence: how wikipedia administrators mold public opinion

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Social Link Prediction in Online Social Tagging Systems

ACM Transactions on Information Systems (TOIS)
Scalable multimedia content analysis on parallel platforms using python

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Mining groups of common interest: discovering topical communities with network flows

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Temporal decomposition and semantic enrichment of mobility flows

Proceedings of the 6th ACM SIGSPATIAL International Workshop on Location-Based Social Networks
One-class conditional random fields for sequential anomaly detection

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.