Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Mixed Membership Stochastic Blockmodels
The Journal of Machine Learning Research
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Efficient methods for topic model inference on streaming document collections
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Probabilistic community discovery using hierarchical latent Gaussian mixture model
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
In recent years, probabilistic topic models have been applied to various kinds of data including text data, and its effectiveness has been demonstrated. Latent Dirichlet Allocation (LDA) is one of the well-known topic models. Variational Bayesian inference or collapsed Gibbs sampling is often employed to estimate parameters in LDA; however, these inference methods require high computational cost for large-scale data. Therefore, high efficiency technology is needed for this purpose. In this paper, we make use of parallel computation technology for the sake of efficient collapsed Gibbs sampling inference for LDA. We assume to use a shared memory cluster (SMP cluster), which is widely used in recent years. In prior work of parallel inference for LDA, either MPI or OpenMP has been used alone. On the other hand, for a SMP cluster it is more suitable to adopt hybrid parallelization that uses message passing for communication between SMP nodes and loop directives for parallelization within each SMP node. In this paper, we developed a MPI/OpenMP hybrid parallel inference method for LDA, and achieved remarkable speedup under various settings of a SMP cluster.