The LRU-K page replacement algorithm for database disk buffering
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A vector space model for automatic indexing
Communications of the ACM
The C++ Programming Language, Third Edition
The C++ Programming Language, Third Edition
Sparse Distributed Memory
Linear algebra operators for GPU implementation of numerical algorithms
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Singular value decomposition on GPU using CUDA
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Tiny encryption algorithm for parallel random numbers on the GPU
Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
Lucene in Action, Second Edition: Covers Apache Lucene 3.0
Lucene in Action, Second Edition: Covers Apache Lucene 3.0
Hi-index | 0.00 |
Vector space models have received a significant attention in recent years. They have been applied in a wide spectrum of areas including information filtering, information retrieval, document indexing and relevancy ranking. Random indexing is one of the methods employing distributional statistics of term co-occurrences to generate vector space models from a set of documents. If the size of the document collection is large, a significant computational power is required to compute the results. This paper presents an efficient implementation of the random indexing method on GPU which allows efficient training on large datasets. It is only limited by the amount of memory available on the GPU. Various ways to overcome the dependence on the GPU memory are discussed. Speedups in magnitude of tens are achieved for training from random seed vectors, and even much higher figures for retraining. The implementation scales well with both the term vector dimension and the seed length.