Diphone-based concatenative speech synthesis systems for arabic language

Authors:
Hazem M. El-Bakry;M. Z. Rashad;Islam R. Isma'il
Affiliations:
Department of Information Systems, Faculty of Computer Science & Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer Science & Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer Science & Information Systems, Mansoura University, Egypt
Venue:
CSECS'11/MECHANICS'11 Proceedings of the 10th WSEAS international conference on Circuits, Systems, Electronics, Control & Signal Processing, and Proceedings of the 7th WSEAS international conference on Applied and Theoretical Mechanics
Year:
2011

Citing 6
Cited 0

Speech Synthesis and Recognition

Speech Synthesis and Recognition
A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
A finite state and data-oriented method for grapheme to phoneme conversion

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Fast time delay neural networks for word detection in video conference

ECC'09 Proceedings of the 3rd international conference on European computing conference
Fast word detection in a speech using new high speed time delay neural networks

WSEAS Transactions on Signal Processing
An overview of text-to-speech synthesis techniques

CIT'10 Proceedings of the 4th international conference on Communications and information technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main objective of this paper is to provide a comparison between two diphone-based concatenative speech synthesis systems for Arabic language. In concatenative speech synthesis systems, speech is generated by joining small prerecorded speech units which are stored in the speech unit inventory. A diphone is a speech unit that begins at the middle of one phoneme and extends to the middle of the following one. Diphones are commonly used in concatenative text to speech (TTS) systems as they have the advantage of modeling co-articulation by including the transition to the next phone inside the unit itself. The first synthesizer in this comparison was implemented using the Festival TTS system and the other synthesizer uses the MARY TTS system. In this comparison, the differences between the two systems in handling some of the challenges of the Arabic language and the differences between the Festival TTS system and the MARY TTS system in the DSP modules are highlighted. Also, the results of applying the diagnostic rhyme test (DRT) on both of the synthesizers are illustrated.