A study on speaker adaptation of the parameters of continuousdensity hidden Markov models

Authors:
C.-H. Lee;C.-H. Lin;B.-H. Juang
Affiliations:
AT&T Bell Labs., Murray Hill, NJ;-;-
Venue:
IEEE Transactions on Signal Processing
Year:
1991

Citing 0
Cited 11

Improved acoustic modeling for continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language
Subspace distance analysis with application to adaptive Bayesian algorithm for face recognition

Pattern Recognition
Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning

Computer Speech and Language
Prior knowledge guided maximum expected likelihood based model selection and adaptation for nonnative speech recognition

Computer Speech and Language
Automatic speech recognition and speech variability: A review

Speech Communication
Incremental HMM training applied to ECG signal analysis

Computers in Biology and Medicine
Feature compensation in the cepstral domain employing model combination

Speech Communication
Rapid and brief communication: Intrapersonal subspace analysis with application to adaptive Bayesian face recognition

Pattern Recognition
Real-world acoustic event detection

Pattern Recognition Letters
Combining pulse-based features for rejecting far-field speech in a HMM-based Voice Activity Detector

Computers and Electrical Engineering
Prior-shared feature and model space speaker adaptation by consistently employing map estimation

Speech Communication

Quantified Score

Hi-index	35.68

Visualization

Abstract

For a speech-recognition system based on continuous-density hidden Markov models (CDHMM), speaker adaptation of the parameters of CDHMM is formulated as a Bayesian learning procedure. A speaker adaptation procedure which is easily integrated into the segmental k-means training procedure for obtaining adaptive estimates of the CDHMM parameters is presented. Some results for adapting both the mean and the diagonal covariance matrix of the Gaussian state observation densities of a CDHMM are reported. The results from tests on a 39-word English alpha-digit vocabulary in isolated word mode indicate that the speaker adaptation procedure achieves the same level of performance as that of a speaker-independent system, when one training token from each word is used to perform speaker adaptation. It shows that much better performance is achieved when two or more training tokens are used for speaker adaptation. When compared with the speaker-dependent system, it is found that the performance of speaker adaptation is always equal to or better than that of speaker-dependent training using the same amount of training data