Implementing random indexing on GPU

  • Authors:
  • Lukas Polok;Pavel Smrz

  • Affiliations:
  • Brno University of Technology, Bozetechova, Brno, Czech Republic;Brno University of Technology, Bozetechova, Brno, Czech Republic

  • Venue:
  • Proceedings of the 19th High Performance Computing Symposia
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Vector space models have received a significant attention in recent years. They have been applied in a wide spectrum of areas including information filtering, information retrieval, document indexing and relevancy ranking. Random indexing is one of the methods employing distributional statistics of term co-occurrences to generate vector space models from a set of documents. If the size of the document collection is large, a significant computational power is required to compute the results. This paper presents an efficient implementation of the random indexing method on GPU which allows efficient training on large datasets. It is only limited by the amount of memory available on the GPU. Various ways to overcome the dependence on the GPU memory are discussed. Speedups in magnitude of tens are achieved for training from random seed vectors, and even much higher figures for retraining. The implementation scales well with both the term vector dimension and the seed length.