A study on speaker-adaptive speech recognition

Authors:
X. D. Huang
Affiliations:
-
Venue:
HLT '91 Proceedings of the workshop on Speech and Natural Language
Year:
1991

Citing 6
Cited 2

A new paradigm for speaker-independent training and speaker adaptation

HLT '90 Proceedings of the workshop on Speech and Natural Language
Improved acoustic modeling for continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language
Improved hidden Markov modeling for speaker-independent continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language
Hidden Markov Models for Speech Recognition

Hidden Markov Models for Speech Recognition
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Speaker adaptation using multiple reference speakers

HLT '89 Proceedings of the workshop on Speech and Natural Language

Minimizing speaker variation effects for speaker-independent speech recognition

HLT '91 Proceedings of the workshop on Speech and Natural Language
Applying SPHINX-II to the DARPA Wall Street Journal CSR task

HLT '91 Proceedings of the workshop on Speech and Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaker-independent system is desirable in many applications where speaker-specific data do not exist. However, if speaker-dependent data are available, the system could be adapted to the specific speaker such that the error rate could be significantly reduced. In this paper, DARPA Resource Management task is used as the domain to investigate the performance of speaker-adaptive speech recognition. Since adaptation is based on speaker-independent systems with only limited adaptation data, a good adaptation algorithm should be consistent with the speaker-independent parameter estimation criterion, and adapt those parameters that are less sensitive to the limited training data. Two parameter sets, the codebook mean vector and the output distribution, are regarded to be most important. They are modified in the framework of maximum likelihood estimation criterion according to the characteristics of each speaker. In order to reliably estimate those parameters, output distributions are shared with each other if they exhibit certain acoustic similarity. In addition to modify these parameters, speaker normalization with neural networks is also studied in the hope that acoustic data normalization will not only rapidly adapt the system but also enhance the robustness of speaker-independent speech recognition. Preliminary results indicate that speaker differences can be well minimized. In comparison with speaker-independent speech recognition, the error rate has been reduced from 4.3% to 3.1% by only using parameter adaptation techniques, with 40 adaptation sentences for each speaker. When the number of speaker adaptation sentences is comparable to that of speaker-dependent training, speaker-adaptive recognition works better than the best speaker-dependent recognition results on the same test set, which indicates the robustness of speaker-adaptive speech recognition.