PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

  • Authors:
  • Yi Wang;Hongjie Bai;Matt Stanton;Wen-Yen Chen;Edward Y. Chang

  • Affiliations:
  • Google Beijing Research, Beijing, China 100084;Google Beijing Research, Beijing, China 100084;Computer Science, CMU, USA;Google Beijing Research, Beijing, China 100084;Google Beijing Research, Beijing, China 100084

  • Venue:
  • AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents PLDA, our parallel implementation of Latent Dirichlet Allocation on MPI and MapReduce. PLDA smooths out storage and computation bottlenecks and provides fault recovery for lengthy distributed computations. We show that PLDA can be applied to large, real-world applications and achieves good scalability. We have released MPI-PLDA to open source at http://code.google.com/p/plda under the Apache License.