Neural Network Implementation Using CUDA and OpenMP

Authors:
Honghoon Jang;Anjin Park;Keechul Jung
Affiliations:
-;-;-
Venue:
DICTA '08 Proceedings of the 2008 Digital Image Computing: Techniques and Applications
Year:
2008

Citing 0
Cited 14

Adaptative Resonance Theory Fuzzy Networks Parallel Computation Using CUDA

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Fast Pattern Classification of Ventricular Arrhythmias Using Graphics Processing Units

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Cortical architectures on a GPGPU

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
GPU implementation of the multiple back-propagation algorithm

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Parallel implementation of Artificial Neural Network training for speech recognition

Pattern Recognition Letters
Parallel batch training of the self-organizing map using openCL

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
A case for neuromorphic ISAs

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Fuzzy ARTMAP based neural networks on the GPU for high-performance pattern recognition

IWINAC'11 Proceedings of the 4th international conference on Interplay between natural and artificial computation: new challenges on bioinspired applications - Volume Part II
A concurrent object-oriented approach to the eigenproblem treatment in shared memory multicore environments

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part I
Bayesian real-time perception algorithms on GPU

Journal of Real-Time Image Processing
Speeding up the training of neural networks with CUDA technology

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
GPGPU implementation of growing neural gas: Application to 3D scene reconstruction

Journal of Parallel and Distributed Computing
Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method

The Journal of Supercomputing
Design exploration of quadrature methods in option pricing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many algorithms for image processing and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation using GPU encounters two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job which needs much cooperation between CPU and GPU, which is usual in image processings and pattern recognitions contrary to the graphics area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results ineffectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text detection system using the proposed architecture, and the computational times showed about 15 times faster than implementation using CPU and about 4 times faster than implementation on only GPU without OpenMP.