Perceptual audio modeling with exponentially damped sinusoids

Authors:
Kris Hermus;Werner Verhelst;Philippe Lemmerling;Patrick Wambacq;Sabine Van Huffel
Affiliations:
Department of Electrical Engineering - ESAT, Laboratory of Processing Speech and Images (PSI), Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium;Department of Electronics & Information processing, Digital Speech & Audio Processing Lab, Faculty of Applied Science, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium;Department of Electrical Engineering - ESAT, Research Group SISTA, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium;Department of Electrical Engineering - ESAT, Laboratory of Processing Speech and Images (PSI), Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium;Department of Electrical Engineering - ESAT, Research Group SISTA, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium
Venue:
Signal Processing - Content-based image and video retrieval
Year:
2005

Citing 5
Cited 2

The Computer Music Tutorial

The Computer Music Tutorial
Digital Coding of Waveforms: Principles and Applications to Speech and Video

Digital Coding of Waveforms: Principles and Applications to Speech and Video
Audio representations for data compression and compressed domain processing

Audio representations for data compression and compressed domain processing
Exponential sinusoidal modeling of transitional speech segments

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Formulation and solution of structured total least norm problemsfor parameter estimation

IEEE Transactions on Signal Processing

Overview of total least-squares methods

Signal Processing
Online algorithm based on support vectors for orthogonal regression

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals.Total least squares (TLS) algorithms are applied for the automatic extraction of the modeling parameters in the ESM, i.e. the amplitude, phase, frequency and damping factors of a user-defined number of damped sinusoids. In order to turn the SNR optimization criterion of these TLS algorithms into a perceptual modeling strategy, we use the psychoacoustic model of MPEG-1 Layer 1 in a subband TLS-ESM scheme. This allows us to model each subband signal in accordance with its perceptual relevance, thereby lowering the number of required modeling components for a given modeling quality. Simulations and listening tests confirm that perceptual ESM achieves the same perceived quality as plain ESM while using substantially less components, and provide support for applying the new model in the fields of parametric audio processing and coding.