Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation

Authors:
Yu Tsao;Xugang Lu;Paul Dixon;Ting-Yao Hu;Shigeki Matsuda;Chiori Hori
Affiliations:
-;-;-;-;-;-
Venue:
Computer Speech and Language
Year:
2014

Citing 15
Cited 0

Speech recognition in noisy environments: a survey

Speech Communication
MMIE training of large vocabulary recognition systems

Speech Communication
On stochastic feature and model compensation approaches to robust speech recognition

Speech Communication - Special issue on robust speech recognition
Robustness in Automatic Speech Recognition: Fundamentals and Applications

Robustness in Automatic Speech Recognition: Fundamentals and Applications
Challenges in adopting speech recognition

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Topics in Acoustic Echo and Noise Control: Selected Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, and Speech Processing (Signals and Communication Technology)

Topics in Acoustic Echo and Noise Control: Selected Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, and Speech Processing (Signals and Communication Technology)
Speech enhancement based on a priori signal to noise estimation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model

EURASIP Journal on Applied Signal Processing
Shape invariant time-scale and pitch modification of speech

IEEE Transactions on Signal Processing
An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Quantile based histogram equalization for noise robust large vocabulary speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Large margin hidden Markov models for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Approximate Test Risk Bound Minimization Through Soft Margin Estimation

IEEE Transactions on Audio, Speech, and Language Processing
Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System

IEEE Transactions on Audio, Speech, and Language Processing
An Environment-Compensated Minimum Classification Error Training Approach Based on Stochastic Vector Mapping

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The maximum a posteriori (MAP) criterion is popularly used for feature compensation (FC) and acoustic model adaptation (MA) to reduce the mismatch between training and testing data sets. MAP-based FC and MA require prior densities of mapping function parameters, and designing suitable prior densities plays an important role in obtaining satisfactory performance. In this paper, we propose to use an environment structuring framework to provide suitable prior densities for facilitating MAP-based FC and MA for robust speech recognition. The framework is constructed in a two-stage hierarchical tree structure using environment clustering and partitioning processes. The constructed framework is highly capable of characterizing local information about complex speaker and speaking acoustic conditions. The local information is utilized to specify hyper-parameters in prior densities, which are then used in MAP-based FC and MA to handle the mismatch issue. We evaluated the proposed framework on Aurora-2, a connected digit recognition task, and Aurora-4, a large vocabulary continuous speech recognition (LVCSR) task. On both tasks, experimental results showed that with the prepared environment structuring framework, we could obtain suitable prior densities for enhancing the performance of MAP-based FC and MA.