Perceptual audio modeling with exponentially damped sinusoids

  • Authors:
  • Kris Hermus;Werner Verhelst;Philippe Lemmerling;Patrick Wambacq;Sabine Van Huffel

  • Affiliations:
  • Department of Electrical Engineering - ESAT, Laboratory of Processing Speech and Images (PSI), Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium;Department of Electronics & Information processing, Digital Speech & Audio Processing Lab, Faculty of Applied Science, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium;Department of Electrical Engineering - ESAT, Research Group SISTA, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium;Department of Electrical Engineering - ESAT, Laboratory of Processing Speech and Images (PSI), Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium;Department of Electrical Engineering - ESAT, Research Group SISTA, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium

  • Venue:
  • Signal Processing - Content-based image and video retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals.Total least squares (TLS) algorithms are applied for the automatic extraction of the modeling parameters in the ESM, i.e. the amplitude, phase, frequency and damping factors of a user-defined number of damped sinusoids. In order to turn the SNR optimization criterion of these TLS algorithms into a perceptual modeling strategy, we use the psychoacoustic model of MPEG-1 Layer 1 in a subband TLS-ESM scheme. This allows us to model each subband signal in accordance with its perceptual relevance, thereby lowering the number of required modeling components for a given modeling quality. Simulations and listening tests confirm that perceptual ESM achieves the same perceived quality as plain ESM while using substantially less components, and provide support for applying the new model in the fields of parametric audio processing and coding.