The cascade-correlation learning architecture
Advances in neural information processing systems 2
Neural, Parallel & Scientific Computations
Using GPUs for Machine Learning Algorithms
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Fast support vector machine training and classification on graphics processors
Proceedings of the 25th international conference on Machine learning
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Neural Network Implementation Using CUDA and OpenMP
DICTA '08 Proceedings of the 2008 Digital Image Computing: Techniques and Applications
Radial Basis Function Networks GPU-Based Implementation
IEEE Transactions on Neural Networks
A hybrid face recognition approach using GPUMLib
CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
A robust learning model for dealing with missing values in many-core architectures
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
Neural PCA and maximum likelihood hebbian learning on the GPU
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Hi-index | 0.00 |
Graphics Processing Units (GPUs) can provide remarkable performance gains when compared to CPUs for computationally-intensive applications. Thus they are much attractive to be used as dedicated hardware in many fields such as in machine learning. In particular, the implementation of neural networks (NNs) in GPUs can decrease enormously the long training times during the learning process. In this paper, we describe a parallel implementation of the Multiple Back-Propagation (MBP) algorithm and present the results obtained when running the algorithm on two well-known benchmarks. We show that for both classification and regression problems our implementation reduces the computational cost when compared with the standalone CPU version.