Acoustic modeling using an extended phone set considering cross-lingual pronunciation variations

  • Authors:
  • Dau-Cheng Lyu;Ren-Yuan Lyu;Ming-Tat Ko

  • Affiliations:
  • Dept. of Electrical Engineering, Chang Gung University, Taiwan;Dept. of Computer Science and Information Engineering, Chang Gung University, Taiwan;Institute of Information Science, Academia Sinica, Taiwan

  • Venue:
  • ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

To deal with the issue of data unbalanced condition among a task of multilingual speech recognition and a phenomenon of pronunciation variations across languages, we propose an approach to clustering context dependent phones from an extended phone set in an acoustic model trained on a data unbalanced bilingual corpus. First, we generate an extended phone set using pronunciation modeling by a confidence measure between Mandarin and Taiwanese. Second, we use a two-step agglomerative hierarchical clustering with delta Bayesian information criteria to automatically generate a merged extended phone set (MEPS). Third, we choose a parametric modeling technique, model complexity selection, to increase the final number of Gaussian components dependent on the available training data in a data unbalanced condition. The experimental results show that the proposed automatic extending phone clustering approach reduced relative syllable error rate by 8.3% over the best result of the decision tree based phone clustering approach.