State-dependent phoneme-based model merging for dialectal Chinese speech recognition

  • Authors:
  • Linquan Liu;Thomas Fang Zheng;Wenhu Wu

  • Affiliations:
  • Center for Speech and Language Technologies, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Center for Speech and Language Technologies, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Center for Speech and Language Technologies, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper discusses and evaluates a novel but simple and effective acoustic modeling method called ''state-dependent phoneme-based model merging (SDPBMM)'', used to build dialectal Chinese speech recognizer from a small amount of dialectal Chinese speech. In SDPBMM, state-level pronunciation modeling is done by merging a tied-state of standard triphones with a state of dialectal monophone(s). In state-level pronunciation modeling, which acts as the merging criterion for SDPBMM, sparseness arises due to limited data set. To overcome this problem, a distance-based pronunciation modeling approach is also proposed. With a 40-min Shanghai-dialectal Chinese speech data, SDPBMM achieves a significant absolute syllable error rate (SER) reduction of approximately 7.1% (and a relative SER reduction of 14.3%) for Shanghai-dialectal Chinese, without performance degradation for standard Chinese. It is experimentally shown that SDPBMM outperforms Maximum Likelihood Linear Regression (MLLR) adaptation and the Pooled Retraining methods by 1.4% and 5.3%, respectively, in terms of SER reduction. Also, when combined with MLLR adaptation, an absolute SER reduction of 1.4% can further be achieved by SDPBMM.