A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models

Authors:
Si Wei;Guoping Hu;Yu Hu;Ren-Hua Wang
Affiliations:
iFLYTEK Research, No. 616, Huangshan Road, Hefei, Anhui, China and University of Science and Technology of China, No. 96, Jinzhai Road, Hefei, Anhui, China;iFLYTEK Research, No. 616, Huangshan Road, Hefei, Anhui, China;iFLYTEK Research, No. 616, Huangshan Road, Hefei, Anhui, China;University of Science and Technology of China, No. 96, Jinzhai Road, Hefei, Anhui, China
Venue:
Speech Communication
Year:
2009

Citing 8
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
Automatic scoring of pronunciation quality

Speech Communication
Phone-level pronunciation scoring and assessment for interactive language learning

Speech Communication
Combination of machine scores for automatic grading of pronunciation quality

Speech Communication
Pronunciation modeling for conversational speech recognition

Pronunciation modeling for conversational speech recognition
Pronunciation modeling for spontaneous mandarin speech recognition

Pronunciation modeling for spontaneous mandarin speech recognition
Speaker normalization on conversational telephone speech

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Speaker normalization using efficient frequency warping procedures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

On mispronunciation analysis of individual foreign speakers using auditory periphery models

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents two new ideas for text dependent mispronunciation detection. Firstly, mispronunciation detection is formulated as a classification problem to integrate various predictive features. A Support Vector Machine (SVM) is used as the classifier and the log-likelihood ratios between all the acoustic models and the model corresponding to the given text are employed as features for the classifier. Secondly, Pronunciation Space Models (PSMs) are proposed to enhance the discriminative capability of the acoustic models for pronunciation variations. In PSMs, each phone is modeled with several parallel acoustic models to represent pronunciation variations of that phone at different proficiency levels, and an unsupervised method is proposed for the construction of the PSMs. Experiments on a database consisting of more than 500,000 Mandarin syllables collected from 1335 Chinese speakers show that the proposed methods can significantly outperform the traditional posterior probability based method. The overall recall rates for the 13 most frequently mispronounced phones increase from 17.2%, 7.6% and 0% to 58.3%, 44.3% and 29.5% at three precision levels of 60%, 70% and 80%, respectively. The improvement is also demonstrated by a subjective experiment with 30 subjects, in which 53.3% of the subjects think the proposed method is better than the traditional one and 23.3% of them think that the two methods are comparable.