A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Authors:
Edward C. Lin;Kai Yu;Rob A. Rutenbar;Tsuhan Chen
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Year:
2007

Citing 6
Cited 9

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Hidden Markov Models for Speech Recognition

Hidden Markov Models for Speech Recognition
A low-power accelerator for the SPHINX 3 speech recognition system

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Architectural optimizations for low-power, real-time speech recognition

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
BEE2: A High-End Reconfigurable Computing System

IEEE Design & Test
Hardware speech recognition for user interfaces in low cost, low power devices

Proceedings of the 42nd annual Design Automation Conference

A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH

ACM Transactions on Embedded Computing Systems (TECS)
A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
A real-time FPGA-based 20 000-word speech recognizer with optimized DRAM access

IEEE Transactions on Circuits and Systems Part I: Regular Papers
FPGA implementation of a pipelined Gaussian calculation for HMM-based large vocabulary speech recognition

International Journal of Reconfigurable Computing - Special issue on selected papers from the southern programmable logic conference (SPL2010)
Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Journal of Signal Processing Systems
A GMM-based speaker identification system on FPGA

ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

Journal of Signal Processing Systems
Spoken dialogue in virtual worlds

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
A hardware spinal decoder

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.