Robust speech recognition with nonstationary noise

Authors:
Dong Zhang;Chunxie Xie
Affiliations:
Department of Mechanical Engineering, South China University of Technology, Guangzhou, Guangdong Province, China;Department of Mechanical Engineering, South China University of Technology, Guangzhou, Guangdong Province, China
Venue:
SPPRA'06 Proceedings of the 24th IASTED international conference on Signal processing, pattern recognition, and applications
Year:
2006

Citing 3
Cited 0

Time and frequency filtering of filter-bank energies for robust HMM speech recognition

Speech Communication - Special issue on noise robust ASR
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Using confidence scores to improve hands-free speech based navigation in continuous dictation systems

ACM Transactions on Computer-Human Interaction (TOCHI)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech recognition is a technology that can improve accessibility to equipment control systems for people with physical disabilities or situation interfere with the use of hand functions. In this paper, a robust speech recognition system is introduced in robotic hospital bed control system to enhance robustness in noisy conditions.A combination of the second-order frequency filtering (FF2) with the RelAtive SpecTrAl (RASTA) technique for the robust speech recognition system is proposed. Two experiments of comparing the traditional Mel-frequency cepstral coefficients (MFCCs) with the new technique using a usual HMM/Gaussian mixture models (HMM/GMM) based recognition system were carried out, for both clean and noisy speech. From these tests, a conclusion that the recognition system usually gets better recognition results when the Rasta filtering is applied to the FF2 features was reached, especially in less stationary noise conditions. This suggests that FF2 combination with Rasta filtering techniques, one of which is working over frequency, the other over time, may cancel out different noise components in the speech signal.