Audio signal representations for indexing in the transform domain

Authors:
Emmanuel Ravelli;Gaël Richard;Laurent Daudet
Affiliations:
Université Pierre et Marie Curie-Paris 6, Institut Jean le Rond d'Alembert-LAM, Paris, France;Institut Telecom, Telecom ParisTech, CNRS, LTCI, Paris, France;Université Pierre et Marie Curie-Paris 6, Institut Jean le Rond d'Alembert-LAM, Paris, France
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 14
Cited 4

A compressed domain beat detector using MP3 audio bitstreams

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Speech Recognition on MPEG/Audio Encoded Files

ICMCS '97 Proceedings of the 1997 International Conference on Multimedia Computing and Systems
Aggregate features and ADABOOST for music classification

Machine Learning
A fast audio classification from MPEG coded data

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Sound analysis using MPEG compressed audio

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Content-based methods for the management of digital music

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Complexity-scalable beat detection with mp3 audio bitstreams

Computer Music Journal
Context-Dependent Beat Tracking of Musical Audio

IEEE Transactions on Audio, Speech, and Language Processing
Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features

IEEE Transactions on Audio, Speech, and Language Processing
Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio

IEEE Transactions on Audio, Speech, and Language Processing
Analysis of the meter of acoustic musical signals

IEEE Transactions on Audio, Speech, and Language Processing
A generic audio classification and segmentation approach for multimedia indexing and retrieval

IEEE Transactions on Audio, Speech, and Language Processing
Union of MDCT Bases for Audio Coding

IEEE Transactions on Audio, Speech, and Language Processing
Instrument-Specific Harmonic Atoms for Mid-Level Music Representation

IEEE Transactions on Audio, Speech, and Language Processing

How sparsely can a signal be approximated while keeping its class identity?

Proceedings of 3rd international workshop on Machine learning and music
On similarity search in audio signals using adaptive sparse approximations

AMR'09 Proceedings of the 7th international conference on Adaptive multimedia retrieval: understanding media and adapting to the user
Matching Pursuits with random sequential subdictionaries

Signal Processing
Dynamic and scalable audio classification by collective network of binary classifiers framework: An evolutionary approach

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compression format, without having to fully decode the signals. Here, we show that the representations used in standard transform-based audio codecs (e.g., MDCT for AAC, or hybrid PQF/MDCT for MP3) have a sufficient time resolution for some rhythmic features, but a poor frequency resolution, which prevents their use in tonality-related applications. Alternatively, a recently developed audio codec based on a sparse multi-scale MDCT transform has a good resolution both for time- and frequency-domain features. We show that this new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification. We compare results obtained with this new audio codec and the two standard MP3 and AAC codecs, in terms of performance and computation time.