Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

Authors:
Paul R. Dixon;Tasuku Oonishi;Sadaoki Furui
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku, Tokyo 152-8552, Japan
Venue:
Computer Speech and Language
Year:
2009

Citing 7
Cited 5

General-Purpose Computations Using Graphics Processors

Computer
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Teaching programmable shaders: lightweight versus heavyweight approach

SIGGRAPH '05 ACM SIGGRAPH 2005 Educators program
Speech recognition systems on the cell broadband engine processor

IBM Journal of Research and Development
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A specialized on-the-fly algorithm for lexicon and language model composition

IEEE Transactions on Audio, Speech, and Language Processing
Advances in speech transcription at IBM under the DARPA EARS program

IEEE Transactions on Audio, Speech, and Language Processing

Interactive sound rendering

ACM SIGGRAPH 2009 Courses
Parallel implementation of Artificial Neural Network training for speech recognition

Pattern Recognition Letters
Spoken dialogue in virtual worlds

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

Computer Speech and Language
Fast Likelihood Computation in Speech Recognition using Matrices

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new GPU scheme that can achieve a very substantial improvement in recognition speed whilst incurring no reduction in recognition accuracy. We evaluate the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time with largest improvements occurring in systems with large numbers of Gaussians. For the systems which achieve the best accuracy we obtained between 2.5 and 3 times speed-ups. The faster decoding times translate to reductions in space, power and hardware costs by only requiring standard hardware that is already widely installed.