PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

  • Authors:
  • Zhiyuan Liu;Yuzhou Zhang;Edward Y. Chang;Maosong Sun

  • Affiliations:
  • Google Inc., China;Google Inc., China;Google Inc., China;Tsinghua University

  • Venue:
  • ACM Transactions on Intelligent Systems and Technology (TIST)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.