Training products of experts by minimizing contrastive divergence
Neural Computation
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A fast learning algorithm for deep belief nets
Neural Computation
Learning Sparse Overcomplete Codes for Images
Journal of VLSI Signal Processing Systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Self-taught learning: transfer learning from unlabeled data
Proceedings of the 24th international conference on Machine learning
Many-core GPU computing with NVIDIA CUDA
Proceedings of the 22nd annual international conference on Supercomputing
High-performance implementation of the level-3 BLAS
ACM Transactions on Mathematical Software (TOMS)
Fast support vector machine training and classification on graphics processors
Proceedings of the 25th international conference on Machine learning
Semi-supervised learning of compact document representations with deep networks
Proceedings of the 25th international conference on Machine learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Power-constrained CMOS scaling limits
IBM Journal of Research and Development
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A dynamically configurable coprocessor for convolutional neural networks
Proceedings of the 37th annual international symposium on Computer architecture
A programmable parallel accelerator for learning and classification
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
High-performance reconfigurable hardware architecture for restricted Boltzmann machines
IEEE Transactions on Neural Networks
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Building a multi-FPGA virtualized restricted boltzmann machine architecture using embedded MPI
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Discriminative deep belief networks for visual data classification
Pattern Recognition
Unsupervised learning of hierarchical representations with convolutional deep belief networks
Communications of the ACM
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification
ACM Transactions on Architecture and Code Optimization (TACO)
Learning a generative model of images by factoring appearance and shape
Neural Computation
GPU-accelerated restricted boltzmann machine for collaborative filtering
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Fast on-line statistical learning on a GPGPU
AusPDC '11 Proceedings of the Ninth Australasian Symposium on Parallel and Distributed Computing - Volume 118
Deep learning of representations: looking forward
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The Shape Boltzmann Machine: A Strong Model of Object Shape
International Journal of Computer Vision
Hi-index | 0.02 |
The promise of unsupervised learning methods lies in their potential to use vast amounts of unlabeled data to learn complex, highly nonlinear models with millions of free parameters. We consider two well-known unsupervised learning models, deep belief networks (DBNs) and sparse coding, that have recently been applied to a flurry of machine learning applications (Hinton & Salakhutdinov, 2006; Raina et al., 2007). Unfortunately, current learning algorithms for both models are too slow for large-scale applications, forcing researchers to focus on smaller-scale models, or to use fewer training examples. In this paper, we suggest massively parallel methods to help resolve these problems. We argue that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods. We develop general principles for massively parallelizing unsupervised learning tasks using graphics processors. We show that these principles can be applied to successfully scaling up learning algorithms for both DBNs and sparse coding. Our implementation of DBN learning is up to 70 times faster than a dual-core CPU implementation for large models. For example, we are able to reduce the time required to learn a four-layer DBN with 100 million free parameters from several weeks to around a single day. For sparse coding, we develop a simple, inherently parallel algorithm, that leads to a 5 to 15-fold speedup over previous methods.