A framework for mixed-language text-to-speech synthesis

Authors:
Mario Malcangi;Philip Grew
Affiliations:
Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy
Venue:
CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics
Year:
2009

Citing 4
Cited 0

A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
Text-to-speech synthesis for embedded speech communicators

AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases
The framework of the Turkish syllable-based concatenative text-to-speech system with exceptional case handling

WSEAS Transactions on Computers
Language-independent, neural network-based, text-to-phones conversion

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of text-to-speech (TTS) synthesis usually refers to a single language and to a single speaker, concatenating short parametrically controlled speech segments by means of a rule-based algorithm. The main disadvantage of this solution is its strong language and speaker dependency. We propose a framework designed to overcame this limitation, employing a multi-language text-to-speech synthesis system. The text-to-speech synthesis framework was designed to embed phonetic and prosodic information in a set of rules. Synthesis of more than one language can easily be carried out by switching from one rule set to another. The system does not depend on phone sets recorded from an actual specific human voice. Rather, it relies on a human-like, speech-synthesis model that can generate the units needed to produce the desired utterance for a specific test string in any kind of voice (male, female, child).