Optimizing multiple pronunciation dictionary based on a confusability measure for non-native speech recognition

Authors:
Mina Kim;Yoo Rhee Oh;Hong Kook Kim
Affiliations:
Gwangju Institute of Science and Technology (GIST), Gwangju, Korea;Gwangju Institute of Science and Technology (GIST), Gwangju, Korea;Gwangju Institute of Science and Technology (GIST), Gwangju, Korea
Venue:
AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Year:
2008

Citing 3
Cited 0

Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
The design for the wall street journal-based CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses issues associated with an efficient pronunciation variation modeling for non-native automatic speech recognition (ASR), where non-native speech is mostly characterized by different pronunciation from native speech. In order to improve the performance of non-native ASR, a multiple pronunciation dictionary using an indirect data-driven approach is first proposed. However, this approach results in an increased search space for ASR decoding due to the increase of the dictionary size. Therefore, we propose a method for optimizing the size of the multiple pronunciation dictionary by removing some confusable pronunciation variants in the dictionary. To this end, a confusability measure is also proposed here based on the Levenshtein distance between two different pronunciation variants. In addition, the number of phonemes for each pronunciation variant is used to optimize the dictionary size. To investigate the effect of the proposed approach on ASR performance, English is selected as a target language and English utterances spoken by Koreans are considered as non-native speech. It is shown from the continuous non-native ASR experiments that the ASR system using the optimized multiple pronunciation dictionary can achieve the average word error rate reduction by 13.53% with less computational complexity by 21.10% relatively, compared with that using the multiple pronunciation dictionary without optimization.