Comparing ANN and GMM in a voice conversion framework

Authors:
R. H. Laskar;D. Chakrabarty;F. A. Talukdar;K. Sreenivasa Rao;K. Banerjee
Affiliations:
Department of Electronics & Communication Engineering, National Institute of Technology Silchar, Silchar 788010, Assam, India;Department of Electronics & Communication Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India;Department of Electronics & Communication Engineering, National Institute of Technology Silchar, Silchar 788010, Assam, India;School of Information Technology, IIT Kharagpur, Kharagpur 721302, West Bengal, India;Department of Electronics & Communication Engineering, National Institute of Technology Silchar, Silchar 788010, Assam, India
Venue:
Applied Soft Computing
Year:
2012

Citing 13
Cited 0

Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Acoustic characteristics of speaker individuality: control and conversion

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Transformation of formants for voice conversion using artificial neural networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker transformation algorithm using segmental codebooks (STASC)

Speech Communication
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Artificial Neural Networks

Artificial Neural Networks
Voice conversion using partitions of spectral feature space

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Computer Speech and Language
Spectral mapping using artificial neural networks for voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Statistical Approach for Voice Personality Transformation

IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussian mixture models (GMMs) for design of voice conversion system using line spectral frequencies (LSFs) as feature vectors. Both the ANN and GMM based models are explored to capture nonlinear mapping functions for modifying the vocal tract characteristics of a source speaker according to a desired target speaker. The LSFs are used to represent the vocal tract transfer function of a particular speaker. Mapping of the intonation patterns (pitch contour) is carried out using a codebook based model at segmental level. The energy profile of the signal is modified using a fixed scaling factor defined between the source and target speakers at the segmental level. Two different methods for residual modification such as residual copying and residual selection methods are used to generate the target residual signal. The performance of ANN and GMM based voice conversion (VC) system are conducted using subjective and objective measures. The results indicate that the proposed ANN-based model using LSFs feature set may be used as an alternative to state-of-the-art GMM-based models used to design a voice conversion system.