Voice transformation by mapping the features at syllable level

Authors:
K. Sreenivasa Rao;R. H. Laskar;Shashidhar G. Koolagudi
Affiliations:
School of Information Technology, IIT Kharagpur, Kharagpur, West Bengal, India;Department of Electrical Engineering, NIT Silchar, Silchar, Assam, India;School of Information Technology, IIT Kharagpur, Kharagpur, West Bengal, India
Venue:
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Year:
2007

Citing 8
Cited 2

Transformation of formants for voice conversion using artificial neural networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker transformation algorithm using segmental codebooks (STASC)

Speech Communication
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Voice Conversion by Prosody and Vocal Tract Modification

ICIT '06 Proceedings of the 9th International Conference on Information Technology
Modeling durations of syllables using neural networks

Computer Speech and Language
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Statistical Approach for Voice Personality Transformation

IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation

IEEE Transactions on Audio, Speech, and Language Processing

Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Computer Speech and Language
Film segmentation and indexing using autoassociative neural networks

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Voice transformation involves modifying the source speaker voice to target speaker voice. Voice characteristics of a speaker depends on the shape of the glottal pulse (source characteristics), shape of the vocal tract system (system characteristics) and the long term features (prosody or supra-segmental) of the speech signal produced by the speaker. In this paper we proposed the mapping functions to transform the vocal tract characteristics and intonation characteristics from source speaker to target speaker. Mapping functions are developed by the features extracted from syllable level. The shape of the vocal tract system is characterized by linear prediction coefficients, and the mapping function is realized by a five layer feedforward neural network. Mapping of the intonation characteristics (pitch contour) is provided by associating the code books derived fromthe pitch contours of the source and target speakers. The proposed mapping functions are used in voice transformation task. The target speaker's speech is synthesized and evaluated using listening tests. The results of the listening tests indicate that the proposed voice transformation provides better mapping of the voice characteristics compared to the earlier method proposed by the author. The original and the synthesized speech signals obtained usingmapping functions are available for listening at http://shilloi.iitg.ernet.in/~ksrao/result.html