Language identification using multi-core processors

Authors:
A. Hanani;M. J. Carey;M. J. Russell
Affiliations:
School of Electronic, Electrical and Computer Engineering, University of Birmingham, Birmingham B15 2TT, UK;School of Electronic, Electrical and Computer Engineering, University of Birmingham, Birmingham B15 2TT, UK;School of Electronic, Electrical and Computer Engineering, University of Birmingham, Birmingham B15 2TT, UK
Venue:
Computer Speech and Language
Year:
2012

Citing 6
Cited 0

Fast fourier transforms: a tutorial review and a state of the art

Signal Processing
Discrete-Time Signal Processing

Discrete-Time Signal Processing
Fast Parallel Expectation Maximization for Gaussian Mixture Models on GPUs Using CUDA

HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
Fast acoustic computations using graphics processors

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Auto-tuning of fast fourier transform on graphics processors

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Human and computer recognition of regional accents and ethnic groups from British English speech

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics processing units (GPUs) provide substantial processing power for little cost. We explore the application of GPUs to speech pattern processing, using language identification (LID) to demonstrate their benefits. Realization of the full potential of GPUs requires both effective coding of predetermined algorithms, and, if there is a choice, selection of the algorithm or technique for a specific function that is most able to exploit the GPU. We demonstrate these principles using the NIST LRE 2003 standard LID task, a batch processing task which involves the analysis of over 600h of speech. We focus on two parts of the system, namely the acoustic classifier, which is based on a 2048 component Gaussian Mixture Model (GMM), and acoustic feature extraction. In the case of the latter we compare a conventional FFT-based analysis with IIR and FIR filter banks, both in terms of their ability to exploit the GPU architecture and LID performance. With no increase in error rate our GPU based system, with an FIR-based front-end, completes the NIST LRE 2003 task in 16h, compared with 180h for the conventional FFT-based system on a standard CPU (a speed up factor of more than 11). This includes a 61% decrease in front-end processing time. In the GPU implementation, front-end processing accounts for 8% and 10% of the total computing times during training and recognition, respectively. Hence the reduction in front-end processing achieved in the GPU implementation is significant.