Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Authors:
Anna Přibilová;Jiří Přibil
Affiliations:
Department of Radio Electronics, Slovak University of Technology, Ilkovičova 3, 812 19 Bratislava, Slovakia;Institute of Radio Engineering and Electronics, Academy of Sciences of the Czech Republic, Chaberská 57, 182 51 Praha 8, Czech Republic
Venue:
Speech Communication
Year:
2006

Citing 6
Cited 6

Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Acoustic characteristics of speaker individuality: control and conversion

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Transformation of formants for voice conversion using artificial neural networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker transformation algorithm using segmental codebooks (STASC)

Speech Communication
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Application of Expressive Speech in TTS System with Cepstral Description

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Spectrum Modification for Emotional Speech Synthesis

Multimodal Signals: Cognitive and Algorithmic Issues
Embedment of 3D virtual human into webpages for visual speech synthesis purpose

VECIMS'09 Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems
Emotional style conversion in the TTS system with cepstral description

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Pitch synchronous transform warping in voice conversion

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Voice conversion, i.e. modification of a speech signal to sound as if spoken by a different speaker, finds its use in speech synthesis with a new voice without necessity of a new database. This paper introduces two new simple non-linear methods of frequency scale mapping for transformation of voice characteristics between male and female or childish. The frequency scale mapping methods were developed primarily for use in the Czech and Slovak text-to-speech (TTS) system designed for the blind and based on the Pocket PC device platform. It uses cepstral description of the diphone speech inventory of the male speaker using the source-filter speech model or the harmonic speech model. Three new diphone speech inventories corresponding to female, childish and young male voices are created from the original male speech inventory. Listening tests are used for evaluation of voice transformation and quality of synthetic speech.