State-dependent phoneme-based model merging for dialectal chinese speech recognition

Authors:
Linquan Liu;Thomas Fang Zheng;Wenhu Wu
Affiliations:
Center for Speech Technology, Tsinghua National Laboratory for, Information Science and Technology, Tsinghua University, Beijing;Center for Speech Technology, Tsinghua National Laboratory for, Information Science and Technology, Tsinghua University, Beijing;Center for Speech Technology, Tsinghua National Laboratory for, Information Science and Technology, Tsinghua University, Beijing
Venue:
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Year:
2006

Citing 5
Cited 0

Elements of information theory

Elements of information theory
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Mandarin pronunciation modeling based on CASS corpus

Journal of Computer Science and Technology
Development of Dialect-Specific Speech Recognizers Using Adaptation Methods

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A new distance measure for probability distribution function of mixture type

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aiming at building a dialectal Chinese speech recognizer from a standard Chinese speech recognizer with a small amount of dialectal Chinese speech, a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM) method, is proposed and evaluated, where a tied-state of standard triphone(s) will be merged with a state of the dialectal monophone that is identical with the central phoneme in the triphone(s). It can be seen that the proposed method has a good performance however it will introduce a Gaussian mixtures expansion problem. To deal with it, an acoustic model distance measure, named pseudo-divergence based distance measure, is proposed based on the difference measurement of Gaussian mixture models and then implemented to downsize the model size almost without causing any performance degradation for dialectal speech. With a small amount of only 40-minute Shanghai-dialectal Chinese speech, the proposed SDPBMM achieves a significant absolute syllable error rate (SER) reduction of 5.9% for dialectal Chinese and almost no performance degradation for standard Chinese. In combination with a certain existing adaptation method, another absolute SER reduction of 1.9% can be further achieved.