INCA algorithm for training voice conversion systems from nonparallel corpora

Authors:
Daniel Erro;Asunción Moreno;Antonio Bonafonte
Affiliations:
AhoLab Signal Processing Laboratory, University of the Basque Country, UPY-EHU, Bilbao, Spain and TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 7
Cited 0

Voice transformation using PSOLA technique

Speech Communication - Eurospeech '91
Fundamentals of speech recognition

Fundamentals of speech recognition
Speaker transformation algorithm using segmental codebooks (STASC)

Speech Communication
High-resolution voice transformation

High-resolution voice transformation
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Quality-enhanced voice morphing using maximum likelihood transformations

IEEE Transactions on Audio, Speech, and Language Processing
Nonparallel training for voice conversion based on a parameter adaptation approach

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing voice conversion systems, particularly those based on Gaussian mixture models, require a set of paired acoustic vectors from the source and target speakers to learn their corresponding transformation function. The alignment of phonetically equivalent source and target vectors is not problematic when the training corpus is parallel, which means that both speakers utter the same training sentences. However, in some practical situations, such as cross-lingual voice conversion, it is not possible to obtain such parallel utterances. With an aim towards increasing the versatility of current voice conversion systems, this paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions. This method is based on existing voice conversion techniques, and it does not require any phonetic or linguistic information. Subjective evaluation experiments show that the performance of the resulting voice conversion system is very similar to that of an equivalent system trained on a parallel corpus.