Auditory-inspired sparse representation of audio signals

Authors:
Ramin Pichevar;Hossein Najaf-Zadeh;Louis Thibault;Hassan Lahdili
Affiliations:
Communications Research Centre, 3701 Carling Ave., Ottawa, Canada;Communications Research Centre, 3701 Carling Ave., Ottawa, Canada;Communications Research Centre, 3701 Carling Ave., Ottawa, Canada;Communications Research Centre, 3701 Carling Ave., Ottawa, Canada
Venue:
Speech Communication
Year:
2011

Citing 11
Cited 1

Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Polychronization: Computation with Spikes

Neural Computation
Sinusoidal modeling using frame-based perceptually weighted matching pursuits

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Sinusoidal modeling of audio and speech using psychoacoustic-adaptive matching pursuits

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Anthropomorphic coding of speech and audio: a model inversion approach

EURASIP Journal on Applied Signal Processing
Sparse coding via thresholding and local competition in neural circuits

Neural Computation
Fast matching pursuit with a multiscale dictionary of Gaussianchirps

IEEE Transactions on Signal Processing
Matching pursuits with time-frequency dictionaries

IEEE Transactions on Signal Processing
Efficient parametric coding of transients

IEEE Transactions on Audio, Speech, and Language Processing
Union of MDCT Bases for Audio Coding

IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised analysis of polyphonic music by sparse coding

IEEE Transactions on Neural Networks

On computational working memory for speech analysis

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article deals with the generation of auditory-inspired spectro-temporal features aimed at audio coding. To do so, we first generate sparse audio representations we call spikegrams, using projections on gammatone/gammachirp kernels that generate neural spikes. Unlike Fourier-based representations, these representations are powerful at identifying auditory events, such as onsets, offsets, transients, and harmonic structures. We show that the introduction of adaptiveness in the selection of gammachirp kernels enhances the compression rate compared to the case where the kernels are non-adaptive. We also integrate a masking model that helps reduce bitrate without loss of perceptible audio quality. We finally propose a method to extract frequent audio objects (patterns) in the aforementioned sparse representations. The extracted frequency-domain patterns (audio objects) help us address spikes (audio events) collectively rather than individually. When audio compression is needed, the different patterns are stored in a small codebook that can be used to efficiently encode audio materials in a lossless way. The approach is applied to different audio signals and results are discussed and compared. This work is a first step towards the design of a high-quality auditory-inspired ''object-based'' audio coder.