Union of MDCT Bases for Audio Coding

Authors:
E. Ravelli;G. Richard;L. Daudet
Affiliations:
Inst. Jean le Rond d'Alembert-LAM, Univ. Pierre et Marie Curie-Paris, Paris;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 9

Efficient architectures of MDCT/IMDCT implementation for MPEG audio codec

ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Parametric dictionary design for sparse coding

IEEE Transactions on Signal Processing
Audio signal representations for indexing in the transform domain

IEEE Transactions on Audio, Speech, and Language Processing
Adaptive signal modeling based on sparse approximations for scalable parametric audio coding

IEEE Transactions on Audio, Speech, and Language Processing
Incorporating scale information with cepstral features: Experiments on musical instrument recognition

Pattern Recognition Letters
How sparsely can a signal be approximated while keeping its class identity?

Proceedings of 3rd international workshop on Machine learning and music
On similarity search in audio signals using adaptive sparse approximations

AMR'09 Proceedings of the 7th international conference on Adaptive multimedia retrieval: understanding media and adapting to the user
Auditory-inspired sparse representation of audio signals

Speech Communication
Matching Pursuits with random sequential subdictionaries

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the use of sparse overcomplete decompositions for audio coding. Audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different scales. This approach produces a sparser decomposition than the traditional MDCT-based orthogonal transform and allows better coding efficiency at low bitrates. Contrary to state-of-the-art low bitrate coders, which are based on pure parametric or hybrid representations, our approach is able to provide transparency. Moreover, we use a bitplane encoding approach, which provides a fine-grain scalable coder that can seamlessly operate from very low bitrates up to transparency. Objective evaluation, as well as listening tests, show that the performance of our coder is significantly better than a state-of-the-art transform coder at very low bitrates and has similar performance at high bitrates. We provide a link to test soundfiles and source code to allow better evaluation and reproducibility of the results.