A perceptual model for sinusoidal audio coding based on spectral integration

Authors:
Steven van de Par;Armin Kohlrausch;Richard Heusdens;Jesper Jensen;Søren Holdt Jensen
Affiliations:
Digital Signal Processing Group, Philips Research Laboratories, Eindhoven, The Netherlands;Digital Signal Processing Group, Philips Research Laboratories, Eindhoven, The Netherlands and Department of Technology Management, Eindhoven University of Technology, Eindhoven, The Netherlands;Department of Mediamatics, Delft University of Technology, Delft, The Netherlands;Department of Mediamatics, Delft University of Technology, Delft, The Netherlands;Department of Communication Technology, Institute of Electronic Systems, Aalborg University, Aalborg, Denmark
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2005

Citing 8
Cited 6

Multirate systems and filter banks

Multirate systems and filter banks
Signal Processing with Lapped Transforms

Signal Processing with Lapped Transforms
Matching Pursuit With Damped Sinusoids

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Audio representations for data compression and compressed domain processing

Audio representations for data compression and compressed domain processing
A perceptually based audio signal model with application to scalable audio compression

A perceptually based audio signal model with application to scalable audio compression
Psychoacoustics: Facts and Models

Psychoacoustics: Facts and Models
Low bit rate high quality audio coding with combined harmonic and wavelet representations

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Sinusoidal modeling using frame-based perceptually weighted matching pursuits

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02

Perceptual coding of audio signals using adaptive time-frequency transform

EURASIP Journal on Audio, Speech, and Music Processing
Adaptive signal modeling based on sparse approximations for scalable parametric audio coding

IEEE Transactions on Audio, Speech, and Language Processing
Time-frequency sparsity by removing perceptually irrelevant components using a simple model of simultaneous masking

IEEE Transactions on Audio, Speech, and Language Processing
Multiple description spherical quantization of sinusoidal parameters with repetition coding of the amplitudes

Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
Multizone Speech Reinforcement

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
An overview of digital speech watermarking

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a psychoacoustical model that is specifically suited for sinusoidal modelling of audio signals. In this paper, we present a new perceptual model that predicts masked thresholds for sinusoidal distortions. The model relies on signal detection theory and incorporates more recent insights about spectral and temporal integration in auditory masking. As a consequence, the model is able to predict the distortion detectability. In fact, the distortion detectability defines a (perceptually relevant) norm on the underlying signal space which is beneficial for optimisation algorithms such as rate-distortion optimisation or linear predictive coding. We evaluate the merits of the model by combining it with a sinusoidal extraction method and compare the results with those obtained with the ISO MPEG-1 Layer I-II recommended model. Listening tests show a clear preference for the new model. More specifically, the model presented here leads to a reduction of more than 20% in terms of number of sinusoids needed to represent signals at a given quality level.