Data-driven voice soruce waveform modelling

Authors:
Mark R. P. Thomas;Jon Gudnason;Patrick A. Naylor
Affiliations:
Imperial College London, Exhibition Road, SW7 2AZ, UK;Imperial College London, Exhibition Road, SW7 2AZ, UK;Imperial College London, Exhibition Road, SW7 2AZ, UK
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 2

The SIGMA algorithm: a glottal activity detector for electroglottographic signals

IEEE Transactions on Audio, Speech, and Language Processing
Data-driven voice source waveform analysis and synthesis

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a data-driven approach to the modelling of voice source waveforms. The voice source is a signal that is estimated by inverse-filtering speech signals with an estimate of the vocal tract filter. It is used in speech analysis, synthesis, recognition and coding to decompose a speech signal into its source and vocal tract filter components. Existing approaches parameterize the voice source signal with physically- or mathematically-motivated models. Though the models are well-defined, estimation of their parameters is not well understood and few are capable of reproducing the large variety of voice source waveforms. Here we present a data-driven approach to classify types of voice source waveforms based upon their melfrequency cepstrum coefficients with Gaussian mixture modelling. A set of “prototype” waveform classes is derived from a weighted average of voice source cycles from real data. An unknown speech signal is then decomposed into its prototype components and resynthesized. Results indicate that with sixteen voice source classes, low resynthesis errors can be achieved.