Implementing random indexing on GPU

Authors:
Lukas Polok;Pavel Smrz
Affiliations:
Brno University of Technology, Bozetechova, Brno, Czech Republic;Brno University of Technology, Bozetechova, Brno, Czech Republic
Venue:
Proceedings of the 19th High Performance Computing Symposia
Year:
2011

Citing 8
Cited 0

The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A vector space model for automatic indexing

Communications of the ACM
The C++ Programming Language, Third Edition

The C++ Programming Language, Third Edition
Sparse Distributed Memory

Sparse Distributed Memory
Linear algebra operators for GPU implementation of numerical algorithms

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Singular value decomposition on GPU using CUDA

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Tiny encryption algorithm for parallel random numbers on the GPU

Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
Lucene in Action, Second Edition: Covers Apache Lucene 3.0

Lucene in Action, Second Edition: Covers Apache Lucene 3.0

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vector space models have received a significant attention in recent years. They have been applied in a wide spectrum of areas including information filtering, information retrieval, document indexing and relevancy ranking. Random indexing is one of the methods employing distributional statistics of term co-occurrences to generate vector space models from a set of documents. If the size of the document collection is large, a significant computational power is required to compute the results. This paper presents an efficient implementation of the random indexing method on GPU which allows efficient training on large datasets. It is only limited by the amount of memory available on the GPU. Various ways to overcome the dependence on the GPU memory are discussed. Speedups in magnitude of tens are achieved for training from random seed vectors, and even much higher figures for retraining. The implementation scales well with both the term vector dimension and the seed length.