Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
The Journal of Machine Learning Research
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Memory bounded inference in topic models
Proceedings of the 25th international conference on Machine learning
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Collaborative filtering for orkut communities: discovery of user latent behavior
Proceedings of the 18th international conference on World wide web
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Topic models for word sense disambiguation and token-based idiom detection
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Discovering routines from large-scale human locations using probabilistic topic models
ACM Transactions on Intelligent Systems and Technology (TIST)
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Topic models for meaning similarity in context
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
ACM Transactions on Intelligent Systems and Technology (TIST)
Web user profiling on proxy logs and its evaluation in personalization
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
MPI/OpenMP hybrid parallel inference for Latent Dirichlet Allocation
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Regularized latent semantic indexing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Topic analysis of web user behavior using LDA model on proxy logs
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Online conversation mining for author characterization and topic identification
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Combining wikipedia-based concept models for cross-language retrieval
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Large scale microblog mining using distributed MB-LDA
Proceedings of the 21st international conference companion on World Wide Web
Memory-restricted latent semantic analysis to accumulate term-document co-occurrence events
Pattern Recognition Letters
MapReduce algorithms for big data analysis
Proceedings of the VLDB Endowment
Pervasive and Mobile Computing
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
ACM Transactions on Information Systems (TOIS)
Time-bound analytic tasks on large datasets through dynamic configuration of workflows
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Latent dirichlet allocation based diversified retrieval for e-commerce search
Proceedings of the 7th ACM international conference on Web search and data mining
Partial-update dimensionality reduction for accumulating co-occurrence events
Pattern Recognition Letters
Hi-index | 0.00 |
This paper presents PLDA, our parallel implementation of Latent Dirichlet Allocation on MPI and MapReduce. PLDA smooths out storage and computation bottlenecks and provides fault recovery for lengthy distributed computations. We show that PLDA can be applied to large, real-world applications and achieves good scalability. We have released MPI-PLDA to open source at http://code.google.com/p/plda under the Apache License.