Algorithms: design techniques and analysis
Algorithms: design techniques and analysis
Techniques for high quality Arabic speech synthesis
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Software engineering: Systems and tools
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Automatic speech recognition and speech variability: A review
Speech Communication
Arabic speech and text in TIDES OnTAP
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Automatic Segmentation and Labeling for Spontaneous Standard Malay Speech Recognition
ICACTE '08 Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering
IEICE - Transactions on Information and Systems
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cross-word Arabic pronunciation variation modeling for speech recognition
International Journal of Speech Technology
Hi-index | 0.00 |
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.