Segment-based stochastic models of spectral dynamics for continuous speech recognition
Segment-based stochastic models of spectral dynamics for continuous speech recognition
Phone-level pronunciation scoring and assessment for interactive language learning
Speech Communication
Automatic Pronunciation Scoring for Language Instruction
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Comparing different approaches for automatic pronunciation error detection
Speech Communication
Embodied conversational agents in computer assisted language learning
Speech Communication
The Sensitivity Matrix: Using Advanced Auditory Models in Speech and Audio Processing
IEEE Transactions on Audio, Speech, and Language Processing
Using Articulatory Representations to Detect Segmental Errors in Nonnative Pronunciation
IEEE Transactions on Audio, Speech, and Language Processing
Auditory Model-Based Design and Optimization of Feature Vectors for Automatic Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In second language (L2) learning, a major difficulty is to discriminate between the acoustic diversity within an L2 phoneme category and that between different categories. We propose a general method for automatic diagnostic assessment of the pronunciation of non-native speakers based on models of the human auditory periphery. Considering each phoneme class separately, the geometric shape similarity between the native auditory domain and the non-native speech domain is measured. The phonemes that deviate the most from the native pronunciation for a set of L2 speakers are detected by comparing the geometric shape similarity measure with that calculated for native speakers on the same phonemes. To evaluate the system, we have tested it with different non-native speaker groups from various language backgrounds. The experimental results are in accordance with linguistic findings and human listeners' ratings, particularly when both the spectral and temporal cues of the speech signal are utilized in the pronunciation analysis.