Stacking Model-Based Korean Prosodic Phrasing Using Speaker Variability Reduction and Linguistic Feature Engineering

  • Authors:
  • Jinsik Lee;Sungjin Lee;Jonghoon Lee;Byeongchang Kim;Gary Geunbae Lee

  • Affiliations:
  • Pohang University of Science and Technology;Pohang University of Science and Technology;Pohang University of Science and Technology;Catholic University of Daegu;Pohang University of Science and Technology

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article presents a prosodic phrasing model for a general purpose Korean speech synthesis system. To reflect the factors affecting prosodic phrasing in the model, linguistically motivated machine-learning features were investigated. These features were effectively incorporated using a stacking model. The phrasing performance was also improved through feature engineering. The corpus used in the experiment is a 4,392-sentence corpus (55,015 words with an average of 13 words per sentence). Because the corpus contains speaker-dependent variability and such variability is not appropriately reflected in a general purpose speech synthesis system, a method to reduce such variability is proposed. In addition, the entire set of data used in the experiment is provided to the public for future use in comparative research.