Unit selection in a concatenative speech synthesis system using a large speech database
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Selecting non-uniform units from a very large corpus for concatenative speech synthesizer
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Face active appearance modeling and speech acoustic information to recover articulation
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Integrating articulatory features into HMM-based parametric speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
An Analysis of HMM-based prediction of articulatory movements
Speech Communication
Correcting errors in speech recognition with articulatory dynamics
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Towards a noisy-channel model of dysarthria in speech recognition
SLPAT '10 Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies
Mapping between acoustic and articulatory gestures
Speech Communication
Using articulatory likelihoods in the recognition of dysarthric speech
Speech Communication
The TORGO database of acoustic and articulatory speech from speakers with dysarthria
Language Resources and Evaluation
An adaptive neural control scheme for articulatory synthesis of CV sequences
Computer Speech and Language
Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
In this paper, we describe a statistical approach to both an articulatory-to-acoustic mapping and an acoustic-to-articulatory inversion mapping without using phonetic information. The joint probability density of an articulatory parameter and an acoustic parameter is modeled using a Gaussian mixture model (GMM) based on a parallel acoustic-articulatory speech database. We apply the GMM-based mapping using the minimum mean-square error (MMSE) criterion, which has been proposed for voice conversion, to the two mappings. Moreover, to improve the mapping performance, we apply maximum likelihood estimation (MLE) to the GMM-based mapping method. The determination of a target parameter trajectory having appropriate static and dynamic properties is obtained by imposing an explicit relationship between static and dynamic features in the MLE-based mapping. Experimental results demonstrate that the MLE-based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.