Stacking Model-Based Korean Prosodic Phrasing Using Speaker Variability Reduction and Linguistic Feature Engineering

Authors:
Jinsik Lee;Sungjin Lee;Jonghoon Lee;Byeongchang Kim;Gary Geunbae Lee
Affiliations:
Pohang University of Science and Technology;Pohang University of Science and Technology;Pohang University of Science and Technology;Catholic University of Daegu;Pohang University of Science and Technology
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2012

Citing 9
Cited 0

A computational grammar of discourse-neutral prosodic phrasing in English

Computational Linguistics
Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

Speech Communication
RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

Speech Communication
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Automatic corpus-based tone and break-index prediction using K-ToBI representation

ACM Transactions on Asian Language Information Processing (TALIP)
Stochastic and syntactic techniques for predicting phrase breaks

Computer Speech and Language
Stacked sequential learning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A prosodic phrasing model for a Korean text-to-speech synthesis system

Computer Speech and Language
Prediction of Korean Prosodic Phrase Boundary by Efficient Feature Selection in Machine Learning

ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents a prosodic phrasing model for a general purpose Korean speech synthesis system. To reflect the factors affecting prosodic phrasing in the model, linguistically motivated machine-learning features were investigated. These features were effectively incorporated using a stacking model. The phrasing performance was also improved through feature engineering. The corpus used in the experiment is a 4,392-sentence corpus (55,015 words with an average of 13 words per sentence). Because the corpus contains speaker-dependent variability and such variability is not appropriately reflected in a general purpose speech synthesis system, a method to reduce such variability is proposed. In addition, the entire set of data used in the experiment is provided to the public for future use in comparative research.