An overview of text-to-speech synthesis techniques

  • Authors:
  • M. Z. Rashad;Hazem M. El-Bakry;Islam R. Isma'il;Nikos Mastorakis

  • Affiliations:
  • Department of Computer Science, Faculty of Computer and Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer and Information Systems, Mansoura University, Egypt;Department of Information Systems, Faculty of Computer and Information Systems, Mansoura University, Egypt;Technical University of Sofia, Bulgaria

  • Venue:
  • CIT'10 Proceedings of the 4th international conference on Communications and information technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of this paper is to provide a short but comprehensive overview of text-to-speech synthesis by highlighting its natural language processing (NLP) and digital signal processing (DSP) components. First, the front-end or the NLP component comprised of text analysis, phonetic analysis, and prosodic analysis is introduced then two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained. After that concatenative synthesis is explored. Compared to rule-based synthesis, concatenative synthesis is simpler since there is no need to determine speech production rules. However, concatenative synthesis introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances of each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. Finally, hidden Markov model (HMM) synthesis is introduced.