An overview of text-to-speech synthesis techniques

Authors:
M. Z. Rashad;Hazem M. El-Bakry;Islam R. Isma'il;Nikos Mastorakis
Affiliations:
Department of Computer Science, Faculty of Computer and Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer and Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer and Information Systems, Mansoura University, Egypt;Technical University of Sofia, Bulgaria
Venue:
CIT'10 Proceedings of the 4th international conference on Communications and information technology
Year:
2010

Citing 3
Cited 1

A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
CHATR: a generic speech synthesis system

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Speech synthesis using stochastic Markov graphs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

Diphone-based concatenative speech synthesis systems for arabic language

CSECS'11/MECHANICS'11 Proceedings of the 10th WSEAS international conference on Circuits, Systems, Electronics, Control & Signal Processing, and Proceedings of the 7th WSEAS international conference on Applied and Theoretical Mechanics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this paper is to provide a short but comprehensive overview of text-to-speech synthesis by highlighting its natural language processing (NLP) and digital signal processing (DSP) components. First, the front-end or the NLP component comprised of text analysis, phonetic analysis, and prosodic analysis is introduced then two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained. After that concatenative synthesis is explored. Compared to rule-based synthesis, concatenative synthesis is simpler since there is no need to determine speech production rules. However, concatenative synthesis introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances of each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. Finally, hidden Markov model (HMM) synthesis is introduced.