Vowel production manifold: intrinsic actor analysis of vowel articulation

  • Authors:
  • Xugang Lu;Jianwu Dang

  • Affiliations:
  • Spoken Language Communication, Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, Kyoto, Japan;Japan Advanced Institute of Science and Technology, Ishikawa, Japan

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The manner in which the organization of vowels in a compact space reflects the relationship between their production and perception remains to be clarified in the field of speech science, although it is believed that vowels exist in a compact space with a regular structure rather than as an unstructured blob. In the articulatory domain, some traditional representations such as those based on the results of tongue position analysis and linear factor analysis are used. However, the former partially encodes information on vowel articulation, and the latter only reflects the linear degrees of freedom of vowel articulation. Since nonlinear degrees of freedom exist during the production of vowels, the traditional linear factor analysis is not suitable. In this paper, we proposed the use of Laplacian eigenmaps for analyzing the intrinsic factors affecting vowel articulation and obtain a compact manifold representation of vowels. On the manifold, vowels have distinct cluster positions depending on the similarities in their production and perception. On the basis of vowel articulation, we state that the first dimension of the manifold structure is related to the tongue height position and the second dimension to mouth opening. The third dimension is related to the articulation location of vowels along the vocal tract which is curved along the manifold. A similar topological manifold structure is explored in the vowel acoustic space. Quantitatively, on the basis of the conditional entropy criterion and vowel identification experiments, we confirmed that the analyzed compact manifold structure encodes more information on vowels than do traditional representations.