New machine scores and their combinations for automatic Mandarin phonetic pronunciation quality assessment

  • Authors:
  • Fuping Pan;Qingwei Zhao;Yonghong Yan

  • Affiliations:
  • ThinkIT Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;ThinkIT Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;ThinkIT Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses Mandarin vowel pronunciation quality assessment. The phonetic pronunciation quality is traditionally evaluated under the speech recognition framework by the phonetic posterior probability score, which may be computed by normalizing the frame-based posterior probability or be calculated on the phone segment directly. By the first method, we can achieve a human-machine scoring correlation coefficient (CC) of 0.832 for vowel; and by the second, the CC can be up to 0.847. This paper proposes a novel kind of formant feature and applies the feature to the evaluation of vowel: we transform the formant plots on the time-frequency plane to a bitmap and extract its Gabor feature for pattern classification; when use the classification probability for pronunciation assessment, we can get a CC of 0.842. Finally we combine the three scores with various linear or nonlinear methods; the best CC of 0.913 is gotten by using neural network.