Phone duration modeling using gradient tree boosting

  • Authors:
  • Junichi Yamagishi;Hisashi Kawai;Takao Kobayashi

  • Affiliations:
  • Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai, Seika, Soraku, Kyoto 619-0288, Japan and Interdisciplinary Gradu ...;Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai, Seika, Soraku, Kyoto 619-0288, Japan and KDDI R&D Laboratories, ...;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, 4259-G2-4 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8502, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In text-to-speech synthesis systems, phone duration influences the quality and naturalness of synthetic speech. In this study, we incorporate an ensemble learning technique called gradient tree boosting into phone duration modeling as an alternative to the conventional approach using regression trees, and objectively evaluate the prediction accuracy of Japanese, Mandarin, and English phone duration. The gradient tree boosting algorithm is a meta algorithm of regression trees: it iteratively builds the regression tree from the residuals and outputs weighting sum of the regression trees. Our evaluation results show that compared to the regression trees or other techniques related to the regression trees, the gradient tree boosting algorithm can substantially and robustly improve the predictive accuracy of the phone duration regardless of languages, speakers, or domains.