Speech Synthesis and Recognition
Speech Synthesis and Recognition
A multistrategy approach to improving pronunciation by analogy
Computational Linguistics
A finite state and data-oriented method for grapheme to phoneme conversion
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Fast time delay neural networks for word detection in video conference
ECC'09 Proceedings of the 3rd international conference on European computing conference
Fast word detection in a speech using new high speed time delay neural networks
WSEAS Transactions on Signal Processing
An overview of text-to-speech synthesis techniques
CIT'10 Proceedings of the 4th international conference on Communications and information technology
Hi-index | 0.00 |
The main objective of this paper is to provide a comparison between two diphone-based concatenative speech synthesis systems for Arabic language. In concatenative speech synthesis systems, speech is generated by joining small prerecorded speech units which are stored in the speech unit inventory. A diphone is a speech unit that begins at the middle of one phoneme and extends to the middle of the following one. Diphones are commonly used in concatenative text to speech (TTS) systems as they have the advantage of modeling co-articulation by including the transition to the next phone inside the unit itself. The first synthesizer in this comparison was implemented using the Festival TTS system and the other synthesizer uses the MARY TTS system. In this comparison, the differences between the two systems in handling some of the challenges of the Arabic language and the differences between the Festival TTS system and the MARY TTS system in the DSP modules are highlighted. Also, the results of applying the diagnostic rhyme test (DRT) on both of the synthesizers are illustrated.