A low-power accelerator for the SPHINX 3 speech recognition system

Authors:
Binu Mathew;Al Davis;Zhen Fang
Affiliations:
University of Utah, Salt Late City, UT;University of Utah, Salt Late City, UT;University of Utah, Salt Late City, UT
Venue:
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2003

Citing 5
Cited 13

A hardware accelerator for speech recognition algorithms

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Survey of the state of the art in human language technology

Survey of the state of the art in human language technology
Performance Analysis of Wireless TCP

ICOIN '02 Revised Papers from the International Conference on Information Networking, Wireless Communications Technologies and Network Applications-Part II
A characterization of speech recognition on modern computer systems

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Memory system design space exploration for low-power, real-time speech recognition

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A low power architecture for embedded perception

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Hardware speech recognition for user interfaces in low cost, low power devices

Proceedings of the 42nd annual Design Automation Conference
A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Speech silicon: an FPGA architecture for real-time hidden Markov-model-based speech recognition

EURASIP Journal on Embedded Systems
A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Comparing Energy and Latency of Asynchronous and Synchronous NoCs for Embedded SoCs

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A real-time FPGA-based 20 000-word speech recognizer with optimized DRAM access

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Domain-Specific Optimization of Signal Recognition Targeting FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Journal of Signal Processing Systems
Cost-effectively offering private buffers in SoCs and CMPs

Proceedings of the international conference on Supercomputing
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

Proceedings of the 48th Design Automation Conference
Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate real-time speech recognition is not currently possible in the mobile embedded space where the need for natural voice interfaces is clearly important. The continuous nature of speech recognition coupled with an inherently large working set creates significant cache interference with other processes. Hence real-time recognition is problematic even on high-performance general-purpose platforms. This paper provides a detailed analysis of CMU's latest speech recognizer (Sphinx 3.2), identifies three distinct processing phases, and quantifies the architectural requirements for each phase. Several optimizations are then described which expose parallelism and drastically reduce the bandwidth and power requirements for real-time recognition. A special-purpose accelerator for the dominant Gaussiann probability phase is developed for a 0.25μ CMOS process which is then analyzed and compared with Sphinx's measured energy and performance on a 0.13μ 2.4 GHz Pentium 4 system. The results show an improvement in power consumption by a factor of 29 at equivalent processing throughput. However after normalizing for process, the special-purpose approach has twice the throughput, and consumes 104 times less energy than the general-purpose processor. The energy-delay product is a better comparison metric due to the inherent design trade-offs between energy consumption and performance. The energy-delay product of the special-purpose approach is 196 times better than the Pentium 4. These results provide strong evidence that real-time large vocabulary speech recognition can be done within a power budget commensurate with embedded processing using today's technology.