Scalable HMM based inference engine in large vocabulary continuous speech recognition

Authors:
Jike Chong;Kisun You;Youngmin Yi;Ekaterina Gonina;Christopher Hughes;Wonyong Sung;Kurt Keutzer
Affiliations:
Department of Electrical Engineering and Computer Science, University of California, Berkeley;School of Electrical Engineering, Seoul National University and Intel Corporation;Department of Electrical Engineering and Computer Science, University of California, Berkeley;Department of Electrical Engineering and Computer Science, University of California, Berkeley;Intel Corporation;School of Electrical Engineering, Seoul National University;Department of Electrical Engineering and Computer Science, University of California, Berkeley
Venue:
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Year:
2009

Citing 5
Cited 2

Parallel Speech Recognition

International Journal of Parallel Programming
Speech recognition on vector architectures

Speech recognition on vector architectures
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Fast acoustic computations using graphics processors

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Composing parallel software efficiently with lithe

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
GPP-Grep: high-speed regular expression processing engine on general purpose processors

RAID'12 Proceedings of the 15th international conference on Research in Attacks, Intrusions, and Defenses

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel scalability allows an application to efficiently utilize an increasing number of processing elements. In this paper we explore a design space for application scalability for an inference engine in large vocabulary continuous speech recognition (LVCSR). Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs. The challenge is not only to define a software architecture that exposes sufficient fine-grained application concurrency, but also to efficiently synchronize between an increasing number of concurrent tasks and to effectively utilize the parallelism opportunities in today's highly parallel processors. We propose four application-level implementation alternatives we call "algorithm styles", and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and a NVIDIA GTX280 manycore processor. The highest performing algorithm style varies with the implementation platform. On 44 minutes of speech data set, we demonstrate substantial speedups of 3.4× on Core i7 and 10.5× on GTX280 compared to a highly optimized sequential implementation on Core i7 without sacrificing accuracy. The parallel implementations contain less than 2.5% sequential overhead, promising scalability and significant potential for further speedup on future platforms.