ASR based pronunciation evaluation with automatically generated competing vocabulary and classifier fusion

Authors:
Carlos Molina;Néstor Becerra Yoma;Jorge Wuth;Hiram Vivanco
Affiliations:
Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;Department of Linguistics, Universidad de Chile, Santiago, Chile
Venue:
Speech Communication
Year:
2009

Citing 6
Cited 0

On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Theoretical Study on Six Classifier Fusion Strategies

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sum Versus Vote Fusion in Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Phonetic Rules for Diagnosis of Pronunciation Errors

KONVENS 2000 / Sprachkommunikation, Vorträge der gemeinsamen Veranstaltung 5. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS), 6. ITG-Fachtagung "Sprachkommunikation"
Automatic Pronunciation Scoring for Language Instruction

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Theoretical and Experimental Analysis of Linear Combiners for Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the application of automatic speech recognition (ASR) technology in computer aided pronunciation training (CAPT) is addressed. A method to automatically generate the competitive lexicon, required by an ASR engine to compare the pronunciation of a target word with its correct and wrong phonetic realizations, is proposed. In order to enable the efficient deployment of CAPT applications, the generation of this competitive lexicon does not require any human assistance or a priori information of mother language dependent error rules. Moreover, a Bayes based multi-classifier fusion approach to map ASR objective confidence scores to subjective evaluations in pronunciation assessment is presented. The method proposed here to generate a competitive lexicon given a target word leads to averaged subjective-objective score correlation equal to 0.67 and 0.82 with five and two levels of pronunciation quality, respectively. Finally, multi-classifier systems (MCS) provide a promising formal framework to combine poorly correlated scores in CAPT. When applied to ASR confidence metrics, MCS can lead to an increase of 2.4% and a reduction of 10.2% in subjective-objective score correlation and classification error, respectively, with two pronunciation quality levels.