Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

Authors:
Kisun You;Jungwook Choi;Wonyong Sung
Affiliations:
School of Electrical Engineering, Seoul National University, Seoul, Korea 151-744;School of Electrical Engineering, Seoul National University, Seoul, Korea 151-744;School of Electrical Engineering, Seoul National University, Seoul, Korea 151-744
Venue:
Journal of Signal Processing Systems
Year:
2012

Citing 7
Cited 0

A low-power accelerator for the SPHINX 3 speech recognition system

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Speech silicon: an FPGA architecture for real-time hidden Markov-model-based speech recognition

EURASIP Journal on Embedded Systems
A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
VLSI for 5000-word continuous speech recognition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A real-time FPGA-based 20 000-word speech recognizer with optimized DRAM access

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hardware implementation of speech recognition can not only accelerate decoding speed for real-time processing but also reduce the power consumption. Recently the weighted finite state transducer (WFST) has emerged as a major recognition network representation because it reduces the algorithmic complexity of decoding procedures by applying many optimizations on the network in offline. However, hardware implementation of continuous speech recognition (CSR) with the WFST network is challenging, mainly because Viterbi search should traverse a large sized network with limited hardware resources. This paper presents two hardware speech recognition systems with the WFST network. The first one, which is called the SRAM-oriented system, utilizes the internal SRAM as a hash table to efficiently manage active working set. This system is flexible because it can easily accommodate different speech recognition tasks as long as the SRAM space is allowed. For easy expansion, we also propose the DRAM-oriented system where the active working set is stored in the external DRAM. To hide long latency of DRAM access, a split DRAM hash table is employed, which stores active working set in the opened rows of DRAM to reduce the number of row misses. Experimental results show that the SRAM-oriented system decodes the 5k-word CSR task 4.93 times faster than real-time, while the DRAM-oriented system runs 4.48 times faster than real-time with only about a half SRAM capacity.