Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Hidden Markov Models for Speech Recognition
Hidden Markov Models for Speech Recognition
Let's Hear It for Audio Mining
Computer
A low-power accelerator for the SPHINX 3 speech recognition system
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Architectural optimizations for low-power, real-time speech recognition
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
BEE2: A High-End Reconfigurable Computing System
IEEE Design & Test
Hardware speech recognition for user interfaces in low cost, low power devices
Proceedings of the 42nd annual Design Automation Conference
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
A real-time FPGA-based 20 000-word speech recognizer with optimized DRAM access
IEEE Transactions on Circuits and Systems Part I: Regular Papers
FPGA implementation for GMM-based speaker identification
International Journal of Reconfigurable Computing - Special issue on selected papers from the southern programmable logic conference (SPL2010)
International Journal of Reconfigurable Computing - Special issue on selected papers from the southern programmable logic conference (SPL2010)
Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition
Journal of Signal Processing Systems
Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers
Journal of Signal Processing Systems
Hi-index | 0.00 |
Today's best quality speech recognition systems are implemented in software. These systems fully occupy the resources of a high-end server to deliver results at real-time speed: each hour of audio requires a significant fraction of an hour of computation for recognition. This is profoundly limiting for applications that require extreme recognition speed, for example, high-volume tasks such as video indexing (e.g., YouTube), or high-speed tasks such as triage of homeland security intelligence. We describe the architecture and implementation of one critical component -- the backend search stage -- of a high-speed, large-vocabulary recognizer. Implemented on a multi-FPGA Berkeley Emulation Engine 2 (BEE2) platform, we handle a standard 5000-word Wall Street Journal speech benchmark. Our backend search engine can decode on average 10 times faster than real-time running at 100 MHz, i.e, 10x faster than real-time, with negligible degradation in accuracy, running at a clock rate ~ 30x slower than a conventional server. To the best of our knowledge, this is both the most complex, and the fastest recognizer ever to be realized in a hardware form.