A multistrategy approach to improving pronunciation by analogy
Computational Linguistics
CHATR: a generic speech synthesis system
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Speech synthesis using stochastic Markov graphs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Diphone-based concatenative speech synthesis systems for arabic language
CSECS'11/MECHANICS'11 Proceedings of the 10th WSEAS international conference on Circuits, Systems, Electronics, Control & Signal Processing, and Proceedings of the 7th WSEAS international conference on Applied and Theoretical Mechanics
Hi-index | 0.00 |
The goal of this paper is to provide a short but comprehensive overview of text-to-speech synthesis by highlighting its natural language processing (NLP) and digital signal processing (DSP) components. First, the front-end or the NLP component comprised of text analysis, phonetic analysis, and prosodic analysis is introduced then two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained. After that concatenative synthesis is explored. Compared to rule-based synthesis, concatenative synthesis is simpler since there is no need to determine speech production rules. However, concatenative synthesis introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances of each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. Finally, hidden Markov model (HMM) synthesis is introduced.