Robust audio identification for MP3 popular music

Authors:
Wei Li;Yaduo Liu;Xiangyang Xue
Affiliations:
Fudan University, Shanghai, China;Fudan University, Shanghai, China;Fudan University, Shanghai, China
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 13
Cited 4

A compressed domain beat detector using MP3 audio bitstreams

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Content-based retrieval of MP3 music objects

Proceedings of the tenth international conference on Information and knowledge management
A singer identification technique for content-based classification of MP3 music objects

Proceedings of the eleventh international conference on Information and knowledge management
Light weight MP3 watermarking method for mobile terminals

Proceedings of the 13th annual ACM international conference on Multimedia
A Review of Audio Fingerprinting

Journal of VLSI Signal Processing Systems
Audio indexing: primary components retrieval

Multimedia Tools and Applications
Sound analysis using MPEG compressed audio

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Content-based methods for the management of digital music

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Tempo induction algorithm in MP3 compressed domain

Proceedings of the international workshop on Workshop on multimedia information retrieval
Waveprint: Efficient wavelet-based audio fingerprinting

Pattern Recognition
Digital Watermarking and Steganography

Digital Watermarking and Steganography
A feature-based robust digital image watermarking scheme

IEEE Transactions on Signal Processing
A Query-by-Singing System for Retrieving Karaoke Music

IEEE Transactions on Multimedia

Music identification via vocabulary tree with MFCC peaks

MIRUM '11 Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies
Modeling concept dynamics for large scale music search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Efficient video segment matching for detecting temporal-based video copies

Neurocomputing
An analysis of content-based classification of audio signals using a fuzzy c-means algorithm

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audio identification via fingerprint has been an active research field with wide applications for years. Many technical papers were published and commercial software systems were also employed. However, most of these previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store on personal computers and transmit on the Internet. It would be interesting if a compressed unknown audio fragment is able to be directly recognized from the database without the fussy and time-consuming decompression-identification-recompression procedure. So far, very few algorithms run directly in the compressed domain for music information retrieval, and most of them take advantage of MDCT coefficients or derived energy type of features. As a first attempt, we propose in this paper utilizing compressed-domain spectral entropy as the audio feature to implement a novel audio fingerprinting algorithm. The compressed songs stored in a music database and the possibly distorted compressed query excerpts are first partially decompressed to obtain the MDCT coefficients as the intermediate result. Then by grouping granules into longer blocks, remapping the MDCT coefficients into 192 new frequency lines to unify the frequency distribution of long and short windows, and defining 9 new subbands which cover the main frequency bandwidth of popular songs in accordance with the scale-factor bands of short windows, we calculate the spectral entropy of all consecutive blocks and come to the final fingerprint sequence by means of magnitude relationship modeling. Experiments show that such fingerprints exhibit strong robustness against various audio signal distortions like recompression, noise interference, echo addition, equalization, band-pass filtering, pitch shifting, and slight time-scale modification etc. For 5s-long query examples which might be severely degraded, an average top-five retrieval precision rate of more than 90% can be obtained in our test data set composed of 1822 popular songs.