Handbook of Neural Computation
Handbook of Neural Computation
Training products of experts by minimizing contrastive divergence
Neural Computation
Acceleration Techniques for the Backpropagation Algorithm
Proceedings of the EURASIP Workshop 1990 on Neural Networks
Neural, Parallel & Scientific Computations
Using GPUs for Machine Learning Algorithms
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A fast learning algorithm for deep belief nets
Neural Computation
An empirical evaluation of deep architectures on problems with many factors of variation
Proceedings of the 24th international conference on Machine learning
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Fast support vector machine training and classification on graphics processors
Proceedings of the 25th international conference on Machine learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale deep unsupervised learning using graphics processors
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning
A Large-Scale Architecture for Restricted Boltzmann Machines
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Deep belief networks are compact universal approximators
Neural Computation
Understanding throughput-oriented architectures
Communications of the ACM
High-performance reconfigurable hardware architecture for restricted Boltzmann machines
IEEE Transactions on Neural Networks
Hi-index | 0.01 |
In this paper we focus on two complementary approaches to significantly decrease pre-training time of a deep belief network (DBN). First, we propose an adaptive step size technique to enhance the convergence of the contrastive divergence (CD) algorithm, thereby reducing the number of epochs to train the restricted Boltzmann machine (RBM) that supports the DBN infrastructure. Second, we present a highly scalable graphics processing unit (GPU) parallel implementation of the CD-k algorithm, which boosts notably the training speed. Additionally, extensive experiments are conducted on the MNIST and the HHreco databases. The results suggest that the maximum useful depth of a DBN is related to the number and quality of the training samples. Moreover, it was found that the lower-level layer plays a fundamental role for building successful DBN models. Furthermore, the results contradict the pre-conceived idea that all the layers should be pre-trained. Finally, it is shown that by incorporating multiple back-propagation (MBP) layers, the DBNs generalization capability is remarkably improved.