Transformation of formants for voice conversion using artificial neural networks
Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker-specific mapping for text-independent speaker recognition
Speech Communication
Artificial Neural Networks
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
Voice conversion using Artificial Neural Networks
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Voice conversion by mapping the speaker-specific features using pitch synchronous approach
Computer Speech and Language
Nonparallel training for voice conversion based on a parameter adaptation approach
IEEE Transactions on Audio, Speech, and Language Processing
Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory
IEEE Transactions on Audio, Speech, and Language Processing
Emotion recognition from speech using source, system, and prosodic features
International Journal of Speech Technology
Comparing ANN and GMM in a voice conversion framework
Applied Soft Computing
Voice conversion using linear prediction coefficients and artificial neural network
Proceedings of the CUBE International Information Technology Conference
Characterization and recognition of emotions from speech using excitation source information
International Journal of Speech Technology
Hi-index | 0.00 |
In this paper, we use artificial neural networks (ANNs) for voice conversion and exploit the mapping abilities of an ANN model to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using an ANN model and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion, evaluated using subjective and objective measures, confirm that an ANN-based VC system performs as good as that of a GMM-based VC system, and the quality of the transformed speech is intelligible and possesses the characteristics of a target speaker. In this paper, we also address the issue of dependency of voice conversion techniques on parallel data between the source and the target speakers. While there have been efforts to use nonparallel data and speaker adaptation techniques, it is important to investigate techniques which capture speaker-specific characteristics of a target speaker, and avoid any need for source speaker's data either for training or for adaptation. In this paper, we propose a voice conversion approach using an ANN model to capture speaker-specific characteristics of a target speaker and demonstrate that such a voice conversion approach can perform monolingual as well as cross-lingual voice conversion of an arbitrary source speaker.