MPI/OpenMP hybrid parallel inference for Latent Dirichlet Allocation

Authors:
Shotaro Tora;Koji Eguchi
Affiliations:
Kobe University, Rokkodai, Nada, Kobe, Japan;Kobe University, Rokkodai, Nada, Kobe, Japan
Venue:
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Year:
2011

Citing 10
Cited 0

Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Mixed Membership Stochastic Blockmodels

The Journal of Machine Learning Research
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Probabilistic community discovery using hierarchical latent Gaussian mixture model

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, probabilistic topic models have been applied to various kinds of data including text data, and its effectiveness has been demonstrated. Latent Dirichlet Allocation (LDA) is one of the well-known topic models. Variational Bayesian inference or collapsed Gibbs sampling is often employed to estimate parameters in LDA; however, these inference methods require high computational cost for large-scale data. Therefore, high efficiency technology is needed for this purpose. In this paper, we make use of parallel computation technology for the sake of efficient collapsed Gibbs sampling inference for LDA. We assume to use a shared memory cluster (SMP cluster), which is widely used in recent years. In prior work of parallel inference for LDA, either MPI or OpenMP has been used alone. On the other hand, for a SMP cluster it is more suitable to adopt hybrid parallelization that uses message passing for communication between SMP nodes and loop directives for parallelization within each SMP node. In this paper, we developed a MPI/OpenMP hybrid parallel inference method for LDA, and achieved remarkable speedup under various settings of a SMP cluster.