Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

Authors:
Gilles Degottex;Pierre Lanchantin;Axel Roebel;Xavier Rodet
Affiliations:
Ircam - CNRS-UMR9912-STMS, Analysis-Synthesis Team, 1 Place Igor Stravinsky, 75004 Paris, France;Ircam - CNRS-UMR9912-STMS, Analysis-Synthesis Team, 1 Place Igor Stravinsky, 75004 Paris, France;Ircam - CNRS-UMR9912-STMS, Analysis-Synthesis Team, 1 Place Igor Stravinsky, 75004 Paris, France;Ircam - CNRS-UMR9912-STMS, Analysis-Synthesis Team, 1 Place Igor Stravinsky, 75004 Paris, France
Venue:
Speech Communication
Year:
2013

Citing 12
Cited 0

Synthesis of breathy vowels: some research methods

Speech Communication - Special issue on speaker characterization in speech terminology
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Linear Prediction of Speech

Linear Prediction of Speech
Two-Band Excitation for HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
On cepstral and all-pole based spectral envelope modeling with unknown model order

Pattern Recognition Letters
Adaptive threshold determination for spectral peak classification

Computer Music Journal
ARX-LF-based source-filter methods for voice modification and transformation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Voice transformation using PSOLA technique

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
HNS: speech modification based on a harmonic+noise model

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Adaptive AM–FM Signal Decomposition With Application to Speech Analysis

IEEE Transactions on Audio, Speech, and Language Processing
HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

IEEE Transactions on Audio, Speech, and Language Processing
Phase Minimization for Glottal Model Estimation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition.