TTS based very low bit rate speech coder

Authors:
Ki-Seung Lee;R. V. Cox
Affiliations:
AT&TLabs.-Res., Florham Park, NJ, USA;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 1

A segmental speech coder based on a concatenative TTS

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a speech coder which uses a text-to-speech (TTS) synthesis system to achieve very low bit rates (sub 1 kbps). The main issue of the work is the accurate coding of the pitch (f/sub 0/) and gain contours which are principle components of prosody. This is of paramount interest since the correct prosody will increase naturalness and an efficient coding scheme will provide high coding gain. Together with the phonetic transcription, the f/sub 0/ and gain contour constitute the parameters that are necessary for the TTS system to synthesize the speech signal. Piecewise linear approximation is used to code the f/sub 0/ parameter. A technique which minimizes the bit rate while maintaining f/sub 0/ error below a given threshold are described. To obtain both high compression and smoothly changing gain contours, the variance of the signal is averaged over each half phoneme length is transmitted as gain information. With single speaker stimuli, and a priori text transcription information, we obtained natural sounding speech at an average bit rate of about 300 bps.