Parallel latent semantic analysis using a graphics processing unit

Authors:
Joseph M. Cavanagh;Thomas E. Potok;Xiaohui Cui
Affiliations:
University of Minnesota - Morris, Morris, MN, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA
Venue:
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Year:
2009

Citing 2
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Using latent semantic analysis to improve access to textual information

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Best-effort semantic document search on GPUs

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Introducing scalable quantum approaches in language representation

QI'11 Proceedings of the 5th international conference on Quantum interaction
Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Journal of Parallel and Distributed Computing
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent Semantic Analysis (LSA) can be used to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. In this paper, we presented a parallel LSA implementation on the GPU, using NVIDIA R Compute Unified Device Architecture (CUDA) and Compute Unified Basic Linear Algebra Subprograms (CUBLAS). The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. For large matrices that have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version.