A Model-Based Learning Process for Modeling Coarticulation of Human Speech

Authors:
Jianguo Wei;Xugang Lu;Jianwu Dang
Affiliations:
-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2007

Citing 8
Cited 1

Machine learning an artificial intelligence approach volume II

Machine learning an artificial intelligence approach volume II
Some properties of the bilevel programming problem

Journal of Optimization Theory and Applications
Statistical methods for speech recognition

Statistical methods for speech recognition
Direct search methods: then and now

Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. IV: optimization and nonlinear equations
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Mesh Adaptive Direct Search Algorithms for Constrained Optimization

SIAM Journal on Optimization
Practical Bilevel Optimization: Algorithms and Applications (Nonconvex Optimization and Its Applications)

Practical Bilevel Optimization: Algorithms and Applications (Nonconvex Optimization and Its Applications)
Finding Optimal Algorithmic Parameters Using Derivative-Free Optimization

SIAM Journal on Optimization

Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Machine learning techniques have long been applied in many fields and have gained a lot of success. The purpose of learning processes is generally to obtain a set of parameters based on a given data set by minimizing a certain objective function which can explain the data set in a maximum likelihood or minimum estimation error sense. However, most of the learned parameters are highly data dependent and rarely reflect the true physical mechanism that is involved in the observation data. In order to obtain the inherent knowledge involved in the observed data, it is necessary to combine physical models with learning process rather than only fitting the observations with a black box model. To reveal underlying properties of human speech production, we proposed a learning process based on a physiological articulatory model and a coarticulation model, where both of the models are derived from human mechanisms. A two-layer learning framework was designed to learn the parameters concerned with physiological level using the physiological articulatory model and the parameters in the motor planning level using the coarticulation model. The learning process was carried out on an articulatory database of human speech production. The learned parameters were evaluated by numerical experiments and listening tests. The phonetic targets obtained in the planning stage provided an evidence for understanding the virtual targets of human speech production. As a result, the model based learning process reveals the inherent mechanism of the human speech via the learned parameters with certain physical meaning.