Syllable-based automatic arabic speech recognition in noisy-telephone channel

Authors:
Mohamed Mostafa Azmi;Hesham Tolba;Sherif Mahdy;Mervat Fashal
Affiliations:
Elect. Eng. Dept., Alexandria Higher Institute of Engineering, Alexandria University, Alexandria, Egypt;Elect. Eng. Dept., Faculty of Engineering, Alexandria University, Alexandria, Egypt;IT Dept., Faculty of Information Technology, Cairo University, Alexandria, Egypt;Phonetics Dept., Faculty of Arts., Alexandria University, Alexandria, Egypt
Venue:
WSEAS Transactions on Signal Processing
Year:
2008

Citing 4
Cited 1

Vocabulary-independent speech recognition: the Vocind System

Vocabulary-independent speech recognition: the Vocind System
Fundamentals of speech recognition

Fundamentals of speech recognition
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Automatic diacritization of Arabic for acoustic modeling in speech recognition

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of transmission channels. In this paper, we concentrate on the telephone recognition of Egyptian Arabic speech using syllables. Arabic spoken digits were described by showing their constructing phonemes, triphones, syllables and words. Speaker-independent hidden markov models (HMMs)-based speech recognition system was designed using Hidden markov model toolkit (HTK). The database used for both training and testing consists from forty-four Egyptian speakers. In clean environment, experiments show that the recognition rate using syllables outperformed the rate obtained using monophones, triphones and words by 2.68%, 1.19% and 1.79% respectively. Also in noisy telephone channel, syllables outperformed the rate obtained using monophones, triphones and words by 2.09%, 1.5% and 0.9% respectively. Comparative experiments have indicated that the use of syllables as acoustic units leads to an improvement in the recognition performance of HMM-based ASR systems in noisy environments. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. Moreover, syllable-based recognition has relatively smaller number of used units and runs faster than word-based recognition.