Time-frequency sparsity by removing perceptually irrelevant components using a simple model of simultaneous masking

Authors:
Peter Balazs;Bernhard Laback;Gerhard Eckel;Werner A. Deutsch
Affiliations:
Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria;Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria;Institute of Electronic Music and Acoustics, University of Music and Dramatic Arts, Graz, Austria;Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 6
Cited 1

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
A perceptual model for sinusoidal audio coding based on spectral integration

EURASIP Journal on Applied Signal Processing
Double Preconditioning for Gabor Frames

IEEE Transactions on Signal Processing
Frame-theoretic analysis of oversampled filter banks

IEEE Transactions on Signal Processing
Stable recovery of sparse overcomplete representations in the presence of noise

IEEE Transactions on Information Theory
Compressed Sensing and Redundant Dictionaries

IEEE Transactions on Information Theory

Auditory time-frequency masking: psychoacoustical data and application to audio representations

CMMR'11 Proceedings of the 8th international conference on Speech, Sound and Music Processing: embracing research in India

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an algorithm for removing time-frequency components, found by a standard Gabor transform, of a "real-world" sound while causing no audible difference to the original sound after resynthesis. Thus, this representation is made sparser. The selection of removable components is based on a simple model of simultaneous masking in the auditory system. Important goals were the applicability to any real-world music and speech sound, integrating mutual masking effects between time-frequency components, coping with the time-frequency spread of such an operation, and computational efficiency. The proposed algorithm first determines an estimation of the masked threshold within an analysis window. The masked threshold function is then shifted in level by an amount determined experimentally, and all components falling below this function (the irrelevance threshold) are removed. This shift gives a conservative way to deal with uncertainty effects resulting from removing time-frequency components and with inaccuracies in the masking model. The removal of components is described as an adaptive Gabor multiplier. Thirty-six normal hearing subjects participated in an experiment to determine the maximum shift value for which they could not discriminate the irrelevance filtered signal from the original signal. On average across the test stimuli, 32 percent of the time-frequency components fell below the irrelevance threshold.