Toward language-independent text-to-speech synthesis

Authors:
Mario Malcangi;Philip Grew
Affiliations:
Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy
Venue:
WSEAS Transactions on Information Science and Applications
Year:
2010

Citing 6
Cited 1

A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
Text-to-speech synthesis for embedded speech communicators

AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases
The framework of the Turkish syllable-based concatenative text-to-speech system with exceptional case handling

WSEAS Transactions on Computers
A comparison of neural networks for real-time emotionrecognition from speech signals A comparison of neural networks for real-time emotionrecognition from speech signals

WSEAS Transactions on Signal Processing
Language-independent, neural network-based, text-to-phones conversion

Neurocomputing
Audio interaction with multimedia information

CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics

Multimodal mood-based annotation

WSEAS Transactions on Information Science and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text-to-speech (TTS) synthesis is becoming a fundamental part of any embedded system that has to interact with humans. Language-independence in speech synthesis is a primary requirement for systems that are not practical to update, as is the case for most embedded systems. Because current text-to-speech synthesis usually refers to a single language and to a single speaker (or at most a limited set of voices), a framework for language-independent, text-to-speech synthesis is proposed to overcome these limitations in implementing speech synthesis on embedded systems. The proposed text-to-speech synthesis framework was designed to embed phonetic and prosodic information in a set of rules. To complete this language-independent speech-synthesis solution, a universal set of phones has been defined so that the appropriate speech sounds for every language are available at run time. Synthesis of more than one language can easily be carried out by switching from one rule set to another while keeping a common phone-data set. Using a vocal-track-based speech synthesizer, the system does not depend on phone sets recorded from an actual specific human voice, so voice types can be chosen at run time.