Spectral mapping using artificial neural networks for voice conversion

Authors:
Srinivas Desai;Alan W. Black;B. Yegnanarayana;Kishore Prahallad
Affiliations:
International Institute of Information Technology, Hyderabad, India;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India and Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 9
Cited 5

Transformation of formants for voice conversion using artificial neural networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker-specific mapping for text-independent speaker recognition

Speech Communication
Artificial Neural Networks

Artificial Neural Networks
Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
High Quality Voice Conversion through Phoneme-Based Linear Mapping Functions with STRAIGHT for Mandarin

FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
Voice conversion using Artificial Neural Networks

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Computer Speech and Language
Nonparallel training for voice conversion based on a parameter adaptation approach

IEEE Transactions on Audio, Speech, and Language Processing
Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

IEEE Transactions on Audio, Speech, and Language Processing

Emotion recognition from speech using source, system, and prosodic features

International Journal of Speech Technology
Comparing ANN and GMM in a voice conversion framework

Applied Soft Computing
Voice conversion using linear prediction coefficients and artificial neural network

Proceedings of the CUBE International Information Technology Conference
Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology
Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we use artificial neural networks (ANNs) for voice conversion and exploit the mapping abilities of an ANN model to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using an ANN model and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion, evaluated using subjective and objective measures, confirm that an ANN-based VC system performs as good as that of a GMM-based VC system, and the quality of the transformed speech is intelligible and possesses the characteristics of a target speaker. In this paper, we also address the issue of dependency of voice conversion techniques on parallel data between the source and the target speakers. While there have been efforts to use nonparallel data and speaker adaptation techniques, it is important to investigate techniques which capture speaker-specific characteristics of a target speaker, and avoid any need for source speaker's data either for training or for adaptation. In this paper, we propose a voice conversion approach using an ANN model to capture speaker-specific characteristics of a target speaker and demonstrate that such a voice conversion approach can perform monolingual as well as cross-lingual voice conversion of an arbitrary source speaker.