Ensemble Discriminant Sparse Projections Applied to Music Genre Classification

Authors:
Constantine Kotropoulos;Gonzalo R. Arce;Yannis Panagakis
Affiliations:
-;-;-
Venue:
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Year:
2010

Citing 0
Cited 1

An analysis of the GTZAN music genre dataset

Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Resorting to the rich, psycho-physiologically grounded, properties of the slow temporal modulations of music recordings, a novel classifier ensemble is built, which applies discriminant sparse projections. More specifically, over complete dictionaries are learned and sparse coefficient vectors are extracted to optimally approximate the slow temporal modulations of the training music recordings. The sparse coefficient vectors are then projected to the principal subspaces of their within-class and between-class covariance matrices. Decisions are taken with respect to the minimum Euclidean distance from the class mean sparse coefficient vectors, which undergo the aforementioned projections. The application of majority voting to the decisions taken by 10 individual classifiers, which are trained on the 10 training folds defined by stratified 10-fold cross-validation on the GTZAN dataset, yields a music genre classification accuracy of 84.96% on average. The latter exceeds by 2.46% the highest accuracy previously reported without employing any sparse representations.