Parallelization and Characterization of Probabilistic Latent Semantic Analysis

Authors:
Chuntao Hong;Wenguang Chen;Weimin Zheng;Jiulong Shan;Yurong Chen;Yimin Zhang
Affiliations:
-;-;-;-;-;-
Venue:
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Year:
2008

Citing 0
Cited 2

Efficient Probabilistic Latent Semantic Analysis through Parallelization

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
P2LSA and P2LSA+: two paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic Latent Semantic Analysis (PLSA) is one of the most popular statistical techniques for the analysis of two-model and co-occurrence data. It has applications in information retrieval and filtering, nature language processing, machine learning from text, and other related areas. However, PLSA is rarely applied to large datasets due to its high computational complexity.This paper presents an optimized and parallelized implementation of PLSA which is capable of processing datasets with 10000 documents in seconds. Compared to the baseline program, our parallelized program can achieve speedup of more than six on an eight-processor machine. The characterization of the parallel program is also presented. The performance analysis of the parallel program indicates that this program is memory intensive and the limited memory bandwidth is the bottleneck for better speedup.