The Journal of Machine Learning Research
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Organizing the OCA: learning faceted subjects from a library of digital books
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Memory bounded inference in topic models
Proceedings of the 25th international conference on Machine learning
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On weighted balls-into-bins games
Theoretical Computer Science
Collaborative filtering for orkut communities: discovery of user latent behavior
Proceedings of the 18th international conference on World wide web
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Learning author-topic models from text corpora
ACM Transactions on Information Systems (TOIS)
Distributed Algorithms for Topic Models
The Journal of Machine Learning Research
Processing web-scale multimedia data
Proceedings of the international conference on Multimedia
A deep-learning model-based and data-driven hybrid architecture for image annotation
Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
Confucius and its intelligent disciples: integrating social with search
Proceedings of the VLDB Endowment
Entity set expansion using topic information
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
MPI/OpenMP hybrid parallel inference for Latent Dirichlet Allocation
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
A conversation with Dr. Edward Y. Chang
ACM SIGKDD Explorations Newsletter
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
ACM Transactions on Information Systems (TOIS)
When a city tells a story: urban topic analysis
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Exploring generative models of tripartite graphs for recommendation in social media
Proceedings of the 4th International Workshop on Modeling Social Media
Towards high-throughput gibbs sampling at scale: a study across storage managers
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A graph-based topic extraction method enabling simple interactive customization
Proceedings of the 2013 ACM symposium on Document engineering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Social Link Prediction in Online Social Tagging Systems
ACM Transactions on Information Systems (TOIS)
Scalable multimedia content analysis on parallel platforms using python
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Mining groups of common interest: discovering topical communities with network flows
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Temporal decomposition and semantic enrichment of mobility flows
Proceedings of the 6th ACM SIGSPATIAL International Workshop on Location-Based Social Networks
One-class conditional random fields for sequential anomaly detection
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.