Diphone-based concatenative speech synthesis systems for arabic language

  • Authors:
  • Hazem M. El-Bakry;M. Z. Rashad;Islam R. Isma'il

  • Affiliations:
  • Department of Information Systems, Faculty of Computer Science & Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer Science & Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer Science & Information Systems, Mansoura University, Egypt

  • Venue:
  • CSECS'11/MECHANICS'11 Proceedings of the 10th WSEAS international conference on Circuits, Systems, Electronics, Control & Signal Processing, and Proceedings of the 7th WSEAS international conference on Applied and Theoretical Mechanics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main objective of this paper is to provide a comparison between two diphone-based concatenative speech synthesis systems for Arabic language. In concatenative speech synthesis systems, speech is generated by joining small prerecorded speech units which are stored in the speech unit inventory. A diphone is a speech unit that begins at the middle of one phoneme and extends to the middle of the following one. Diphones are commonly used in concatenative text to speech (TTS) systems as they have the advantage of modeling co-articulation by including the transition to the next phone inside the unit itself. The first synthesizer in this comparison was implemented using the Festival TTS system and the other synthesizer uses the MARY TTS system. In this comparison, the differences between the two systems in handling some of the challenges of the Arabic language and the differences between the Festival TTS system and the MARY TTS system in the DSP modules are highlighted. Also, the results of applying the diagnostic rhyme test (DRT) on both of the synthesizers are illustrated.