Toward language-independent text-to-speech synthesis

  • Authors:
  • Mario Malcangi;Philip Grew

  • Affiliations:
  • Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy

  • Venue:
  • WSEAS Transactions on Information Science and Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text-to-speech (TTS) synthesis is becoming a fundamental part of any embedded system that has to interact with humans. Language-independence in speech synthesis is a primary requirement for systems that are not practical to update, as is the case for most embedded systems. Because current text-to-speech synthesis usually refers to a single language and to a single speaker (or at most a limited set of voices), a framework for language-independent, text-to-speech synthesis is proposed to overcome these limitations in implementing speech synthesis on embedded systems. The proposed text-to-speech synthesis framework was designed to embed phonetic and prosodic information in a set of rules. To complete this language-independent speech-synthesis solution, a universal set of phones has been defined so that the appropriate speech sounds for every language are available at run time. Synthesis of more than one language can easily be carried out by switching from one rule set to another while keeping a common phone-data set. Using a vocal-track-based speech synthesizer, the system does not depend on phone sets recorded from an actual specific human voice, so voice types can be chosen at run time.