Design and evaluation of prosodically-sensitive concatenative units for a Korean TTS system

  • Authors:
  • Kyuchul Yoon

  • Affiliations:
  • Department of Linguistics, The Ohio State University, 1712 Neil Avenue, Columbus, OH 43210, USA

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Korean text-to-speech (TTS) synthesis system. The diphones used are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The four levels of the Korean prosodic hierarchy were observed in the diphone selection process, thereby selecting four different versions of each diphone: three edge diphones from the prosodic domains of the intonational phrase (IP), accentual phrase (AP) and prosodic word (PW), and a non-edge diphone from the domain of the prosodic word. Due to the size of the corpus that we employed, our system covers only 36.4% of the 6503 possible diphones. A listening experiment designed to evaluate the quality of the diphone database showed that listeners preferred stimuli composed of prosodically appropriate diphones. We interpret this as supporting the view that segments carry prosodic domain information.