Large-scale deep unsupervised learning using graphics processors

Authors:
Rajat Raina;Anand Madhavan;Andrew Y. Ng
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 14
Cited 16

Training products of experts by minimizing contrastive divergence

Neural Computation
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A fast learning algorithm for deep belief nets

Neural Computation
Learning Sparse Overcomplete Codes for Images

Journal of VLSI Signal Processing Systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
Many-core GPU computing with NVIDIA CUDA

Proceedings of the 22nd annual international conference on Supercomputing
High-performance implementation of the level-3 BLAS

ACM Transactions on Mathematical Software (TOMS)
Fast support vector machine training and classification on graphics processors

Proceedings of the 25th international conference on Machine learning
Semi-supervised learning of compact document representations with deep networks

Proceedings of the 25th international conference on Machine learning
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Power-constrained CMOS scaling limits

IBM Journal of Research and Development

Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A dynamically configurable coprocessor for convolutional neural networks

Proceedings of the 37th annual international symposium on Computer architecture
A programmable parallel accelerator for learning and classification

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
High-performance reconfigurable hardware architecture for restricted Boltzmann machines

IEEE Transactions on Neural Networks
A case for neuromorphic ISAs

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Building a multi-FPGA virtualized restricted boltzmann machine architecture using embedded MPI

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Discriminative deep belief networks for visual data classification

Pattern Recognition
Unsupervised learning of hierarchical representations with convolutional deep belief networks

Communications of the ACM
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

ACM Transactions on Architecture and Code Optimization (TACO)
Learning a generative model of images by factoring appearance and shape

Neural Computation
GPU-accelerated restricted boltzmann machine for collaborative filtering

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Fast on-line statistical learning on a GPGPU

AusPDC '11 Proceedings of the Ninth Australasian Symposium on Parallel and Distributed Computing - Volume 118
Deep learning of representations: looking forward

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Towards adaptive learning with improved convergence of deep belief networks on graphics processing units

Pattern Recognition
A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The Shape Boltzmann Machine: A Strong Model of Object Shape

International Journal of Computer Vision

Quantified Score

Hi-index	0.02

Visualization

Abstract

The promise of unsupervised learning methods lies in their potential to use vast amounts of unlabeled data to learn complex, highly nonlinear models with millions of free parameters. We consider two well-known unsupervised learning models, deep belief networks (DBNs) and sparse coding, that have recently been applied to a flurry of machine learning applications (Hinton & Salakhutdinov, 2006; Raina et al., 2007). Unfortunately, current learning algorithms for both models are too slow for large-scale applications, forcing researchers to focus on smaller-scale models, or to use fewer training examples. In this paper, we suggest massively parallel methods to help resolve these problems. We argue that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods. We develop general principles for massively parallelizing unsupervised learning tasks using graphics processors. We show that these principles can be applied to successfully scaling up learning algorithms for both DBNs and sparse coding. Our implementation of DBN learning is up to 70 times faster than a dual-core CPU implementation for large models. For example, we are able to reduce the time required to learn a four-layer DBN with 100 million free parameters from several weeks to around a single day. For sparse coding, we develop a simple, inherently parallel algorithm, that leads to a 5 to 15-fold speedup over previous methods.