Prior-shared feature and model space speaker adaptation by consistently employing map estimation

Authors:
Seong-Jun Hahm;Shinji Watanabe;Atsunori Ogawa;Masakiyo Fujimoto;Takaaki Hori;Atsushi Nakamura
Affiliations:
NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan
Venue:
Speech Communication
Year:
2013

Citing 11
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
A comparison of novel techniques for rapid speaker adaptation

Speech Communication
Speaker Adaptive Training: A Maximum Likelihood Approach to Speaker Normalization

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Experiments in Speaker Normalisation and Adaptation for Large Vocabulary Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A parametric approach to vocal tract length normalization

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Refinement Approach for Adaptation Based on Combination of MAP and fMLLR

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Predictor-corrector adaptation by using time evolution system with macroscopic time scale

IEEE Transactions on Audio, Speech, and Language Processing
A Maximum Likelihood Approach to Continuous Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A study on speaker adaptation of the parameters of continuousdensity hidden Markov models

IEEE Transactions on Signal Processing
Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of this paper is to describe the development of a speaker adaptation method that improves speech recognition performance regardless of the amount of adaptation data. For that purpose, we propose the consistent employment of a maximum a posteriori (MAP)-based Bayesian estimation for both feature space normalization and model space adaptation. Namely, constrained structural maximum a posteriori linear regression (CSMAPLR) is first performed in a feature space to compensate for the speaker characteristics, and then, SMAPLR is performed in a model space to capture the remaining speaker characteristics. A prior distribution stabilizes the parameter estimation especially when the amount of adaptation data is small. In the proposed method, CSMAPLR and SMAPLR are performed based on the same acoustic model. Therefore, the dimension-dependent variations of feature and model spaces can be similar. Dimension-dependent variations of the transformation matrix are explained well by the prior distribution. Therefore, by sharing the same prior distribution between CSMAPLR and SMAPLR, their parameter estimations can be appropriately regularized in both spaces. Experiments on large vocabulary continuous speech recognition using the Corpus of Spontaneous Japanese (CSJ) and the MIT OpenCourseWare corpus (MIT-OCW) confirm the effectiveness of the proposed method compared with other conventional adaptation methods with and without using speaker adaptive training.