Syllable-based automatic arabic speech recognition in noisy-telephone channel

  • Authors:
  • Mohamed Mostafa Azmi;Hesham Tolba;Sherif Mahdy;Mervat Fashal

  • Affiliations:
  • Elect. Eng. Dept., Alexandria Higher Institute of Engineering, Alexandria University, Alexandria, Egypt;Elect. Eng. Dept., Faculty of Engineering, Alexandria University, Alexandria, Egypt;IT Dept., Faculty of Information Technology, Cairo University, Alexandria, Egypt;Phonetics Dept., Faculty of Arts., Alexandria University, Alexandria, Egypt

  • Venue:
  • WSEAS Transactions on Signal Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of transmission channels. In this paper, we concentrate on the telephone recognition of Egyptian Arabic speech using syllables. Arabic spoken digits were described by showing their constructing phonemes, triphones, syllables and words. Speaker-independent hidden markov models (HMMs)-based speech recognition system was designed using Hidden markov model toolkit (HTK). The database used for both training and testing consists from forty-four Egyptian speakers. In clean environment, experiments show that the recognition rate using syllables outperformed the rate obtained using monophones, triphones and words by 2.68%, 1.19% and 1.79% respectively. Also in noisy telephone channel, syllables outperformed the rate obtained using monophones, triphones and words by 2.09%, 1.5% and 0.9% respectively. Comparative experiments have indicated that the use of syllables as acoustic units leads to an improvement in the recognition performance of HMM-based ASR systems in noisy environments. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. Moreover, syllable-based recognition has relatively smaller number of used units and runs faster than word-based recognition.