State-dependent phoneme-based model merging for dialectal Chinese speech recognition

Authors:
Linquan Liu;Thomas Fang Zheng;Wenhu Wu
Affiliations:
Center for Speech and Language Technologies, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Center for Speech and Language Technologies, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Center for Speech and Language Technologies, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
Venue:
Speech Communication
Year:
2008

Citing 6
Cited 1

Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Mandarin pronunciation modeling based on CASS corpus

Journal of Computer Science and Technology
Development of Dialect-Specific Speech Recognizers Using Adaptation Methods

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

Speech Communication
Advances in phone-based modeling for automatic accent classification

IEEE Transactions on Audio, Speech, and Language Processing

Phoneme and tonal accent recognition for Thai speech

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper discusses and evaluates a novel but simple and effective acoustic modeling method called ''state-dependent phoneme-based model merging (SDPBMM)'', used to build dialectal Chinese speech recognizer from a small amount of dialectal Chinese speech. In SDPBMM, state-level pronunciation modeling is done by merging a tied-state of standard triphones with a state of dialectal monophone(s). In state-level pronunciation modeling, which acts as the merging criterion for SDPBMM, sparseness arises due to limited data set. To overcome this problem, a distance-based pronunciation modeling approach is also proposed. With a 40-min Shanghai-dialectal Chinese speech data, SDPBMM achieves a significant absolute syllable error rate (SER) reduction of approximately 7.1% (and a relative SER reduction of 14.3%) for Shanghai-dialectal Chinese, without performance degradation for standard Chinese. It is experimentally shown that SDPBMM outperforms Maximum Likelihood Linear Regression (MLLR) adaptation and the Pooled Retraining methods by 1.4% and 5.3%, respectively, in terms of SER reduction. Also, when combined with MLLR adaptation, an absolute SER reduction of 1.4% can further be achieved by SDPBMM.