Multi-environment model adaptation based on vector Taylor series for robust speech recognition

Authors:
Yong Lü;Haiyang Wu;Lin Zhou;Zhenyang Wu
Affiliations:
School of Information Science and Engineering, Southeast University, Nanjing 210096, China;School of Information Science and Engineering, Southeast University, Nanjing 210096, China;School of Information Science and Engineering, Southeast University, Nanjing 210096, China;School of Information Science and Engineering, Southeast University, Nanjing 210096, China
Venue:
Pattern Recognition
Year:
2010

Citing 9
Cited 0

Speech recognition in noisy environments: a survey

Speech Communication
On stochastic feature and model compensation approaches to robust speech recognition

Speech Communication - Special issue on robust speech recognition
Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
Speech recognition in noisy environments

Speech recognition in noisy environments
Automatic speech recognition and speech variability: A review

Speech Communication
Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Feature compensation in the cepstral domain employing model combination

Speech Communication
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

Computer Speech and Language
Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper, we propose a multi-environment model adaptation method based on vector Taylor series (VTS) for robust speech recognition. In the training phase, the clean speech is contaminated with noise at different signal-to-noise ratio (SNR) levels to produce several types of noisy training speech and each type is used to obtain a noisy hidden Markov model (HMM) set. In the recognition phase, the HMM set which best matches the testing environment is selected, and further adjusted to reduce the environmental mismatch by the VTS-based model adaptation method. In the proposed method, the VTS approximation based on noisy training speech is given and the testing noise parameters are estimated from the noisy testing speech using the expectation-maximization (EM) algorithm. The experimental results indicate that the proposed multi-environment model adaptation method can significantly improve the performance of speech recognizers and outperforms the traditional model adaptation method and the linear regression-based multi-environment method.