A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Authors:
Edward C. Lin;Rob A. Rutenbar
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Year:
2009

Citing 8
Cited 5

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Hidden Markov Models for Speech Recognition

Hidden Markov Models for Speech Recognition
Let's Hear It for Audio Mining

Computer
A low-power accelerator for the SPHINX 3 speech recognition system

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Architectural optimizations for low-power, real-time speech recognition

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
BEE2: A High-End Reconfigurable Computing System

IEEE Design & Test
Hardware speech recognition for user interfaces in low cost, low power devices

Proceedings of the 42nd annual Design Automation Conference
A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays

A real-time FPGA-based 20 000-word speech recognizer with optimized DRAM access

IEEE Transactions on Circuits and Systems Part I: Regular Papers
FPGA implementation for GMM-based speaker identification

International Journal of Reconfigurable Computing - Special issue on selected papers from the southern programmable logic conference (SPL2010)
FPGA implementation of a pipelined Gaussian calculation for HMM-based large vocabulary speech recognition

International Journal of Reconfigurable Computing - Special issue on selected papers from the southern programmable logic conference (SPL2010)
Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Journal of Signal Processing Systems
Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's best quality speech recognition systems are implemented in software. These systems fully occupy the resources of a high-end server to deliver results at real-time speed: each hour of audio requires a significant fraction of an hour of computation for recognition. This is profoundly limiting for applications that require extreme recognition speed, for example, high-volume tasks such as video indexing (e.g., YouTube), or high-speed tasks such as triage of homeland security intelligence. We describe the architecture and implementation of one critical component -- the backend search stage -- of a high-speed, large-vocabulary recognizer. Implemented on a multi-FPGA Berkeley Emulation Engine 2 (BEE2) platform, we handle a standard 5000-word Wall Street Journal speech benchmark. Our backend search engine can decode on average 10 times faster than real-time running at 100 MHz, i.e, 10x faster than real-time, with negligible degradation in accuracy, running at a clock rate ~ 30x slower than a conventional server. To the best of our knowledge, this is both the most complex, and the fastest recognizer ever to be realized in a hardware form.