Supervised dictionary learning for music genre classification

Authors:
Chin-Chia Michael Yeh;Yi-Hsuan Yang
Affiliations:
Research Center for IT Innovation Academia Sinica, Taipei, Taiwan;Research Center for IT Innovation Academia Sinica, Taipei, Taiwan
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 21
Cited 1

Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
A comparative study on content-based music genre classification

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A fast learning algorithm for deep belief nets

Neural Computation
Aggregate features and ADABOOST for music classification

Machine Learning
Towards musical query-by-semantic-description using the CAL500 data set

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Online dictionary learning for sparse coding

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Music information retrieval using social tags and audio

IEEE Transactions on Multimedia - Special section on communities and media computing
Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription

IEEE Transactions on Audio, Speech, and Language Processing
Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification

IEEE Transactions on Audio, Speech, and Language Processing
Web-scale k-means clustering

Proceedings of the 19th international conference on World wide web
Music Emotion Recognition

Music Emotion Recognition
Enhancing multi-label music genre classification through ensemble techniques

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Proximal Methods for Hierarchical Sparse Coding

The Journal of Machine Learning Research
Simple and practical algorithm for sparse Fourier transform

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features

IEEE Transactions on Audio, Speech, and Language Processing
Temporal Feature Integration for Music Genre Classification

IEEE Transactions on Audio, Speech, and Language Processing
Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval

IEEE Transactions on Multimedia
A Survey of Audio-Based Music Classification and Annotation

IEEE Transactions on Multimedia
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Bilingual analysis of song lyrics and audio words

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper concerns the development of a music codebook for summarizing local feature descriptors computed over time. Comparing to a holistic representation, this text-like representation better captures the rich and time-varying information of music. We systematically compare a number of existing codebook generation techniques and also propose a new one that incorporates labeled data in the dictionary learning process. Several aspects of the encoding system such as local feature extraction and codeword encoding are also analyzed. Our result demonstrates the superiority of sparsity-enforced dictionary learning over conventional VQ-based or exemplar-based methods. With the new supervised dictionary learning algorithm and the optimal settings inferred from the performance study, we achieve state-of-the-art accuracy of music genre classification using just the log-power spectrogram as the local feature descriptor. The classification accuracies for benchmark datasets GTZAN and IS-MIR2004Genre are 84.7% and 90.8%, respectively.