A framework for mixed-language text-to-speech synthesis

  • Authors:
  • Mario Malcangi;Philip Grew

  • Affiliations:
  • Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy

  • Venue:
  • CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of text-to-speech (TTS) synthesis usually refers to a single language and to a single speaker, concatenating short parametrically controlled speech segments by means of a rule-based algorithm. The main disadvantage of this solution is its strong language and speaker dependency. We propose a framework designed to overcame this limitation, employing a multi-language text-to-speech synthesis system. The text-to-speech synthesis framework was designed to embed phonetic and prosodic information in a set of rules. Synthesis of more than one language can easily be carried out by switching from one rule set to another. The system does not depend on phone sets recorded from an actual specific human voice. Rather, it relies on a human-like, speech-synthesis model that can generate the units needed to produce the desired utterance for a specific test string in any kind of voice (male, female, child).