Phonetically rich and balanced text and speech corpora for Arabic language

Authors:
Mohammad A. Abushariah;Raja N. Ainon;Roziati Zainuddin;Moustafa Elshafei;Othman O. Khalifa
Affiliations:
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 50603 and King Abdullah II School for Information Technology, University of Jordan, Amman, Jord ...;Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 50603;Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 50603;Department of Systems Engineering, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia 31261;Electrical and Computer Engineering Department, Faculty of Engineering, International Islamic University Malaysia, Gombak, Kuala Lumpur, Malaysia 53100
Venue:
Language Resources and Evaluation
Year:
2012

Citing 3
Cited 0

Arabic Natural Language Processing

Arabic Natural Language Processing
Automatic diacritization of Arabic for acoustic modeling in speech recognition

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Impact of a newly developed modern standard Arabic speech corpus on implementing and evaluating automatic continuous speech recognition systems

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three major regions (Levant, Gulf, and Africa). Three hundred and sixty seven sentences are considered as phonetically rich and balanced, which are used for training Arabic Automatic Speech Recognition (ASR) systems. The rich characteristic is in the sense that it must contain all phonemes of Arabic language, whereas the balanced characteristic is in the sense that it must preserve the phonetic distribution of Arabic language. The remaining 48 sentences are created for testing purposes, which are mostly foreign to the training sentences and there are hardly any similarities in words. In order to evaluate the speech corpus, Arabic ASR systems were developed using the Carnegie Mellon University (CMU) Sphinx 3 tools at both training and testing/decoding levels. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 8 h of training speech data, the acoustic model is best using continuous observation's probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The language model contains uni-grams, bi-grams, and tri-grams. For same speakers with different sentences, Arabic ASR systems obtained average Word Error Rate (WER) of 9.70%. For different speakers with same sentences, Arabic ASR systems obtained average WER of 4.58%, whereas for different speakers with different sentences, Arabic ASR systems obtained average WER of 12.39%.