Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis

  • Authors:
  • Suphattharachai Chomphan;Takao Kobayashi

  • Affiliations:
  • Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, 4259-G2-4, Nagatsuta-cho, Midori-ku, Yokohama-shi, 226-8502, Japan and Electrical Engineering Division, ...;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, 4259-G2-4, Nagatsuta-cho, Midori-ku, Yokohama-shi, 226-8502, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A novel approach to the context-clustering process in a speaker-independent HMM-based Thai speech synthesis is addressed in this paper. Improvements to the tone correctness (i.e., tone intelligibility) of the average-voice and also the speaker-adapted voice were our main objectives. To treat the problem of tone neutralization, we incorporated a number of tonal features called tone-geometrical and phrase-intonation features into the context-clustering process of the HMM training stage. We carried out subjective and objective evaluations of both the average voice and adapted voice in terms of the intelligibility of tone and the logarithmic fundamental frequency (F0) error in our experiments. The effects on the decision trees of the extracted features were also evaluated. Several speech-model scenarios including male/female and gender-dependent/gender-independent were implemented to confirm the effectiveness of the proposed approach. The results of subjective tests revealed that the proposed tonal features could improve the intelligibility of tones for all speech-model scenarios. The objective tests also yielded results corresponding to those of the subjective tests. The experimental results from both the subjective and objective evaluations confirmed that the proposed tonal features could alleviate the problem of tone neutralization; as a result, the tone correctness of synthesized speech was significantly improved.