FPGA implementation of a pipelined Gaussian calculation for HMM-based large vocabulary speech recognition

Authors:
Richard Veitch;Louis-Marie Aubert;Roger Woods;Scott Fischaber
Affiliations:
Electronics, Communications and Information Technology, Queens University Belfast, Belfast, UK;Electronics, Communications and Information Technology, Queens University Belfast, Belfast, UK;Electronics, Communications and Information Technology, Queens University Belfast, Belfast, UK;Electronics, Communications and Information Technology, Queens University Belfast, Belfast, UK
Venue:
International Journal of Reconfigurable Computing - Special issue on selected papers from the southern programmable logic conference (SPL2010)
Year:
2011

Citing 3
Cited 0

Let's Hear It for Audio Mining

Computer
A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133 MHz.