A prosodic phrasing model for a Korean text-to-speech synthesis system

Authors:
Kyuchul Yoon
Affiliations:
Department of Linguistics, The Ohio State University, 1712 Neil Avenue, Columbus, OH 43210, USA
Venue:
Computer Speech and Language
Year:
2006

Citing 1
Cited 5

Training intonational phrasing rules automatically for English and Spanish text-to-speech

Speech Communication

Design and evaluation of prosodically-sensitive concatenative units for a Korean TTS system

Computer Speech and Language
Evolutionary-Based Design of a Brazilian Portuguese Recording Script for a Concatenative Synthesis System

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Evaluation of automatic break insertion for an agglutinative and inflected language

Speech Communication
Implementation of Three Text to Speech Systems for Kurdish Language

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Stacking Model-Based Korean Prosodic Phrasing Using Speaker Variability Reduction and Linguistic Feature Engineering

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a prosodic phrasing model for Korean to be used in a text-to-speech synthesis (TTS) system. Read text corpora were morpho-syntactically parsed and prosodically labeled following the Penn Korean Treebank (Han, Chunghye, Ko, Eon-Suk, Yi, Heejong, Palmer, M., 2002. Penn Korean Treebank: development and evaluation. In: Proceedings of the 16th Pacific Asian Conference on Language and Computation. Korean Society for Language and Information.) and K-ToBI prosodic labeling conventions (Sun-Ah, J., 2000. K-ToBI (Korean ToBI) labelling conventions. Version 3.1. Available from: URL .), respectively. Decision trees were trained with morpho-syntactic and textual distance features to predict locations of accentual and intonational phrase breaks. Our phrasing model cross-validated on a 300-sentence corpus (6936 words or 21,436 syllables, with an average of 72 syllables or 23 words per sentence) predicted non-breaks with F=92.4% and breaks with F=88.0% (F=72.8% for accentual phrase breaks and F=71.3% for intonational phrase breaks).