Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Authors:
Dia Abuzeina;Wasfi Al-Khatib;Moustafa Elshafei;Husni Al-Muhtaseb
Affiliations:
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia;King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia;King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia;King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Venue:
International Journal of Speech Technology
Year:
2012

Citing 9
Cited 0

Algorithms: design techniques and analysis

Algorithms: design techniques and analysis
Techniques for high quality Arabic speech synthesis

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Software engineering: Systems and tools
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Automatic speech recognition and speech variability: A review

Speech Communication
Arabic speech and text in TIDES OnTAP

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Automatic Segmentation and Labeling for Spontaneous Standard Malay Speech Recognition

ICACTE '08 Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering
Morpheme-Based Modeling of Pronunciation Variation for Large Vocabulary Continuous Speech Recognition in Korean

IEICE - Transactions on Information and Systems
Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cross-word Arabic pronunciation variation modeling for speech recognition

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.