Supervisory data alignment for text-independent voice conversion

  • Authors:
  • Jianhua Tao;Meng Zhang;Jani Nurminen;Jilei Tian;Xia Wang

  • Affiliations:
  • National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China;Nokia Devices R&D, Tampere, Finland;Nokia Research Center, Beijing, China;Nokia Research Center, Beijing, China

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.