MPI/OpenMP hybrid parallel inference for Latent Dirichlet Allocation

  • Authors:
  • Shotaro Tora;Koji Eguchi

  • Affiliations:
  • Kobe University, Rokkodai, Nada, Kobe, Japan;Kobe University, Rokkodai, Nada, Kobe, Japan

  • Venue:
  • Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, probabilistic topic models have been applied to various kinds of data including text data, and its effectiveness has been demonstrated. Latent Dirichlet Allocation (LDA) is one of the well-known topic models. Variational Bayesian inference or collapsed Gibbs sampling is often employed to estimate parameters in LDA; however, these inference methods require high computational cost for large-scale data. Therefore, high efficiency technology is needed for this purpose. In this paper, we make use of parallel computation technology for the sake of efficient collapsed Gibbs sampling inference for LDA. We assume to use a shared memory cluster (SMP cluster), which is widely used in recent years. In prior work of parallel inference for LDA, either MPI or OpenMP has been used alone. On the other hand, for a SMP cluster it is more suitable to adopt hybrid parallelization that uses message passing for communication between SMP nodes and loop directives for parallelization within each SMP node. In this paper, we developed a MPI/OpenMP hybrid parallel inference method for LDA, and achieved remarkable speedup under various settings of a SMP cluster.