Restructuring Gaussian mixture density functions in speaker-independent acoustic models

Authors:
Atsushi Nakamura
Affiliations:
NTT Communication Science Laboratories, D-202, 2-4 Hikaraidai Seika-Cho, Soraku-Gun, Kyoto 619-0237, Japan
Venue:
Speech Communication
Year:
2002

Citing 5
Cited 0

Multiple VQ hidden Markov modelling for speech recognition

Speech Communication
Hidden Markov Models for Speech Recognition

Hidden Markov Models for Speech Recognition
Decoding optimal state sequence with smooth state likelihoods

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Spontaneous dialogue speech recognition using cross-word context constrained word graphs

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Variable-order N-gram generation by word-class splitting and consecutive word grouping

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In continuous speech recognition featuring hidden Markov model (HMM), word N-gram, and time-synchronous beam search, a local modeling mismatch in the HMM will often cause the recognition performance to degrade. To cope with this problem, this paper proposes a method of restructuring Gaussian mixture output probability density functions (pdfs) in a pre-trained speaker-independent HMM set based on speech data. In this method, Gaussians are copied from other mixture pdfs, taking the distribution of local errors into account. This method leads to a restructuring of the mixture pdfs, where some Gaussians are shared by several states and the total number of Gaussians is not modified. Furthermore, the distribution of local errors is extracted by comparing the pre-trained HMM set and the speech data used in the pre-training, and thus new training data are not needed for this restructuring method. Experimental results prove that the proposed restructuring method can effectively restore local modeling mismatches and improve recognition performance.