A segmental speech coder based on a concatenative TTS

  • Authors:
  • Ki-Seung Lee;Richard V. Cox

  • Affiliations:
  • Department of Electronic Engineering, Konkuk University, 1 Hwayang-dong, Gwangjin-gu, Seoul 143-701, South Korea and Speech Processing Software and Technology Research Department of AT&T Laborator ...;Speech Processing Software and Technology Research Department of AT&T Laboratories Research, NJ

  • Venue:
  • Speech Communication
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

An extremely low bit rate speech coder based on a recognition/synthesis paradigm is proposed. In our speech coder, the speech signal is produced in a way which is similar to concatenative speech synthesis of text-to-speech (TTS). Hence, database construction, unit selection and prosody modification, which are the major parts of concatenative TTS, are employed to implement the speech coder. The synthesis units are automatically found in a large database using a joint segmentation/classification scheme. Dynamic programming (DP) is applied to unit selection in which two cost functions, an acoustic target cost and a concatenation cost are used to increase naturalness as well as intelligibility. Prosodic differences between the selected unit and the input segment are compensated for by time-scale and pitch modifications which are based on the harmonic plus noise (HNM) model framework. In single speaker tests, the proposed scheme gave intelligible and natural sounding speech at an average bit rate of about 580 b/s.