Design and evaluation of prosodically-sensitive concatenative units for a Korean TTS system

Authors:
Kyuchul Yoon
Affiliations:
Department of Linguistics, The Ohio State University, 1712 Neil Avenue, Columbus, OH 43210, USA
Venue:
Computer Speech and Language
Year:
2008

Citing 3
Cited 0

Phonology and syntax: the relationship between sound and structure

Phonology and syntax: the relationship between sound and structure
From text to speech: the MITalk system

From text to speech: the MITalk system
A prosodic phrasing model for a Korean text-to-speech synthesis system

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Korean text-to-speech (TTS) synthesis system. The diphones used are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The four levels of the Korean prosodic hierarchy were observed in the diphone selection process, thereby selecting four different versions of each diphone: three edge diphones from the prosodic domains of the intonational phrase (IP), accentual phrase (AP) and prosodic word (PW), and a non-edge diphone from the domain of the prosodic word. Due to the size of the corpus that we employed, our system covers only 36.4% of the 6503 possible diphones. A listening experiment designed to evaluate the quality of the diphone database showed that listeners preferred stimuli composed of prosodically appropriate diphones. We interpret this as supporting the view that segments carry prosodic domain information.