A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Authors:
David T. Chappell;John H. L. Hansen
Affiliations:
Department of Electrical Engineering, P.O. Box 90291, Duke University, Durham, NC;Robust Speech Processing Laboratory (RSPL), Center for Spoken Language Research (CSLR), Room E265, University of Colorado, 3215 Marine St., P.O. Box 594, Boulder, CO and Department of Electrical E ...
Venue:
Speech Communication
Year:
2002

Citing 15
Cited 2

Practical approaches to speech coding

Practical approaches to speech coding
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
Fundamentals of speech recognition

Fundamentals of speech recognition
MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database

Speech Communication - Speech science and technology: a selection from the papers presented at the Fourth International Conference in Speech Science and Technology (SST-92)
Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Non-parametric techniques for pitch-scale and time-scale modification of speech

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Automatic segmentation of speech recorded in unknown noisy channel characteristics

Speech Communication - Special issue on robust speech recognition
Principles of Computer Speech

Principles of Computer Speech
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Recent Improvements on Microsoft' s Trainable Text-to-Speech System -Whistler

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
An Auditory-Based Measure for Improved Phone Segment Concatenation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Enhancement, segmentation, and synthesis of speech with application to robust speaker recognition

Enhancement, segmentation, and synthesis of speech with application to robust speaker recognition
A low-complexity waveform interpolation coder

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Automatic audio morphing

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02

Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting
A new spectral smoothing algorithm for unit concatenating speech synthesis

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are many scenarios in both speech synthesis and coding in which adjacent time-frames of speech are spectrally discontinuous. This paper addresses the topic of improving concatenative speech synthesis with a limited database by proposing methods to smooth, adjust, or interpolate the spectral transitions between speech segments. The objective is to produce natural-sounding speech via segment concatenation when formants and other spectral features do not align properly. We consider several methods for adjusting the spectra at the boundaries between waveform segments. Techniques examined include optimal coupling, waveform interpolation (WI), linear predictive parameter interpolation, and psychoacoustic closure. Several of these algorithms have been previously developed for either coding or synthesis, while others are enhanced. We also consider the connection between speech science and articulation in determining the type of smoothing appropriate for given phoneme-phoneme transitions. Moreover, this work incorporates the use of a recently-proposed auditory-neural based distance measure (ANBM), which employs a computational model of the auditory system to assess perceived spectral discontinuities. We demonstrate how actual ANBM scores can be used to help determine the need for smoothing. In addition, formal evaluation of four smoothing methods, using the ANBM and extensive listener tests, reveals that smoothing can distinctly improve the quality of speech but when applied inappropriately can also degrade the quality. It is shown that after proper spectral smoothing, or spectral interpolation, the final synthesized speech sounds more natural and has a more continuous spectral structure.