Information processing in dynamical systems: foundations of harmony theory
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Training products of experts by minimizing contrastive divergence
Neural Computation
Improving the prediction accuracy of recommendation algorithms: Approaches anchored on human factors
Interacting with Computers
Restricted Boltzmann machines for collaborative filtering
Proceedings of the 24th international conference on Machine learning
Expert Systems with Applications: An International Journal
Semi-supervised learning of compact document representations with deep networks
Proceedings of the 25th international conference on Machine learning
A high-performance FPGA architecture for restricted boltzmann machines
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Large-scale deep unsupervised learning using graphics processors
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Large-Scale Architecture for Restricted Boltzmann Machines
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
High-performance reconfigurable hardware architecture for restricted Boltzmann machines
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Collaborative Filtering (CF) is an important technique for recommendation systems which model and analyzes the preferences of customers for giving reasonable advices. Recently, many applications based on Restricted Boltzmann Machine (RBM) have been developed for a large variety of learning problems. RBM-based model for Collaborative Filtering (RBM-CF) is able to deal with large scale data sets and obtains good recommendation performance. However, the computation of RBM becomes problematic when using large number of hidden features to improve the recommendation accuracy. Although RBM has great potential for parallelism, it is still a challenge to develop a parallel implementation of RBM-CF on GPU, since the data sets for CF are always large and sparse. In this paper, we propose a parallel implementation of RBM-CF on GPU using CUDA. We first present how to transform the computation of RBM-CF into matrix-based operation on GPU, and three CUDA kernels for sparse matrix-matrix multiplication to further improve the computational efficiency of RBM-CF for modeling large scale and sparse data sets. Experimental results show that significant speedups are achieved by our parallel implementation on GPU.