Parallel implementation of Artificial Neural Network training for speech recognition

Authors:
Stefano Scanzio;Sandro Cumani;Roberto Gemello;Franco Mana;P. Laface
Affiliations:
Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino 10129, Italy;Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino 10129, Italy;Loquendo S.pA., via Olivetti 6, Torino 10148, Italy;Loquendo S.pA., via Olivetti 6, Torino 10148, Italy;Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino 10129, Italy
Venue:
Pattern Recognition Letters
Year:
2010

Citing 9
Cited 6

Introduction to the theory of neural computation

Introduction to the theory of neural computation
Fundamentals of speech recognition

Fundamentals of speech recognition
Methods to speed up error back-propagation learning algorithm

ACM Computing Surveys (CSUR)
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
An updated set of basic linear algebra subprograms (BLAS)

ACM Transactions on Mathematical Software (TOMS)
The design for the wall street journal-based CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Neural Network Implementation Using CUDA and OpenMP

DICTA '08 Proceedings of the 2008 Digital Image Computing: Techniques and Applications
Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

Computer Speech and Language

Scalable image quality assessment with 2D mel-cepstrum and machine learning approach

Pattern Recognition
A Highly Parallel Multi-class Pattern Classification on GPU

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A novel distributed machine learning method for classification: parallel covering algorithm

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Neural PCA and maximum likelihood hebbian learning on the GPU

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Nonlinear speech coding model based on genetic programming

Applied Soft Computing
Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration

Neurocomputing

Quantified Score

Hi-index	0.10

Visualization

Abstract

In this paper we describe the implementation of a complete ANN training procedure using the block mode back-propagation learning algorithm for sequential patterns - such as the observation feature vectors of a speech recognition system - exploiting the high performance SIMD architecture of GPU using CUDA and its C-like language interface. We also compare the speed-up obtained implementing the training procedure only taking advantage of the multi-thread capabilities of multi-core processors. In our implementation we take into account all the peculiar aspects of training large scale sequential patterns, in particular, the re-segmentation of the training sentences, the block size for the feed-forward and for the back-propagation steps, and the transfer of huge amount of data from host memory to the GPU card. Our approach has been tested by training acoustic models for large vocabulary speech recognition tasks, showing a six times reduction of the time required to train real-world large size networks with respect to an already optimized implementation using the Intel MKL libraries. Thanks to these optimizations and to the support of the GPU, the training time for language having a huge set of training sentences (about one million for Italian) can be reduced from approximately a month to 5days.