Speech Communication - Special issue on speech under stress
Data-driven environmental compensation for speech recognition: a unified approach
Speech Communication
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Acoustical and Environmental Robustness in Automatic Speech Recognition
Acoustical and Environmental Robustness in Automatic Speech Recognition
Speech recognition in noisy environments
Speech recognition in noisy environments
Improved speech recognition via speaker stress directed classification
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Constrained iterative speech enhancement with application to speechrecognition
IEEE Transactions on Signal Processing
A study on speaker adaptation of the parameters of continuousdensity hidden Markov models
IEEE Transactions on Signal Processing
Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems
IEEE Transactions on Audio, Speech, and Language Processing
Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In this paper, we present an effective cepstral feature compensation scheme which leverages knowledge of the speech model in order to achieve robust speech recognition. In the proposed scheme, the requirement for a prior noisy speech database in off-line training is eliminated by employing parallel model combination for the noise-corrupted speech model. Gaussian mixture models of clean speech and noise are used for the model combination. The adaptation of the noisy speech model is possible only by updating the noise model. This method has the advantage of reduced computational expenses and improved accuracy for model estimation since it is applied in the cepstral domain. In order to cope with time-varying background noise, a novel interpolation method of multiple models is employed. By sequentially calculating the posterior probability of each environmental model, the compensation procedure can be applied on a frame-by-frame basis. In order to reduce the computational expense due to the multiple-model method, a technique of sharing similar Gaussian components is proposed. Acoustically similar components across an inventory of environmental models are selected by the proposed sub-optimal algorithm which employs the Kullback-Leibler similarity distance. The combined hybrid model, which consists of the selected Gaussian components is used for noisy speech model sharing. The performance is examined using Aurora2 and speech data for an in-vehicle environment. The proposed feature compensation algorithm is compared with standard methods in the field (e.g., CMN, spectral subtraction, RATZ). The experimental results demonstrate that the proposed feature compensation schemes are very effective in realizing robust speech recognition in adverse noisy environments. The proposed model combination-based feature compensation method is superior to existing model-based feature compensation methods. Of particular interest is that the proposed method shows up to an 11.59% relative WER reduction compared to the ETSI AFE front-end method. The multi-model approach is effective at coping with changing noise conditions for input speech, producing comparable performance to the matched model condition. Applying the mixture sharing method brings a significant reduction in computational overhead, while maintaining recognition performance at a reasonable level with near real-time operation.