Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Hi-index | 0.00 |
The authors describe an efficient procedure for acoustic-to-articulatory parameter mapping using neural networks. An assembly of multilayer perceptrons, each designated to a specific region in the articulatory space, is used to map acoustic parameters of the speech into tract areas. The training of this model is executed in two stages; in the first stage a codebook of suitably normalized articulatory parameters is used and in the second stage real speech data are used to further improve the mapping. In general, acoustic-to-articulatory parameter mapping is nonunique; several vocal tract shapes can result in identical spectral envelopes. The model accommodates this ambiguity. During synthesis, neural networks are selected by dynamic programming using a criterion that ensures smoothly varying vocal tract shapes while maintaining a good spectral match.