Transformation of formants for voice conversion using artificial neural networks
Speech Communication - Special issue: voice conversion: state of the art and perspectives
Discrete-time signal processing (2nd ed.)
Discrete-time signal processing (2nd ed.)
Speaker transformation algorithm using segmental codebooks (STASC)
Speech Communication
AANN: an alternative to GMM for pattern recognition
Neural Networks
A Parallel Execution Method for Minimizing Distributed Query Response Time
IEEE Transactions on Parallel and Distributed Systems
High-resolution voice transformation
High-resolution voice transformation
Voice Conversion by Prosody and Vocal Tract Modification
ICIT '06 Proceedings of the 9th International Conference on Information Technology
Modeling durations of syllables using neural networks
Computer Speech and Language
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Transformation of Speaker Characteristics in Speech Using Support Vector Machines
ADCOM '07 Proceedings of the 15th International Conference on Advanced Computing and Communications
Voice transformation by mapping the features at syllable level
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Statistical Approach for Voice Personality Transformation
IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation
IEEE Transactions on Audio, Speech, and Language Processing
Spectral mapping using artificial neural networks for voice conversion
IEEE Transactions on Audio, Speech, and Language Processing
Comparing ANN and GMM in a voice conversion framework
Applied Soft Computing
Voice conversion using linear prediction coefficients and artificial neural network
Proceedings of the CUBE International Information Technology Conference
Expressive speech synthesis: a review
International Journal of Speech Technology
Film segmentation and indexing using autoassociative neural networks
International Journal of Speech Technology
Identification of Indian languages using multi-level spectral and prosodic features
International Journal of Speech Technology
Pitch synchronous and glottal closure based speech analysis for language recognition
International Journal of Speech Technology
Hi-index | 0.00 |
The basic goal of the voice conversion system is to modify the speaker-specific characteristics, keeping the message and the environmental information contained in the speech signal intact. Speaker characteristics reflect in speech at different levels, such as, the shape of the glottal pulse (excitation source characteristics), the shape of the vocal tract (vocal tract system characteristics) and the long-term features (suprasegmental or prosodic characteristics). In this paper, we are proposing neural network models for developing mapping functions at each level. The features used for developing the mapping functions are extracted using pitch synchronous analysis. Pitch synchronous analysis provides the estimation of accurate vocal tract parameters, by analyzing the speech signal independently in each pitch period without influenced by the adjacent pitch cycles. In this work, the instants of significant excitation are used as pitch markers to perform the pitch synchronous analysis. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the linear prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. In this paper, line spectral frequencies (LSFs) are used for representing the vocal tract characteristics, and for developing its associated mapping function. LP residual of the speech signal is viewed as excitation source, and the residual samples around the instant of glottal closure are used for mapping. Prosodic parameters at syllable and phrase levels are used for deriving the mapping function. Source and system level mapping functions are derived pitch synchronously, and the incorporation of target prosodic parameters is performed pitch synchronously using instants of significant excitation. The performance of the voice conversion system is evaluated using listening tests. The prediction accuracy of the mapping functions (neural network models) used at different levels in the proposed voice conversion system is further evaluated using objective measures such as deviation (D"i), root mean square error (@m"R"M"S"E) and correlation coefficient (@c"X","Y). The proposed approach (i.e., mapping and modification of parameters using pitch synchronous approach) used for voice conversion is shown to be performed better compared to the earlier method (mapping the vocal tract parameters using block processing) proposed by the author.