Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification

Authors:
Yannis Panagakis;Constantine Kotropoulos;Gonzalo R. Arce
Affiliations:
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Electrical and Computer Engineering, University of Delaware, Newark, DE
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 20
Cited 9

An Optimal Transformation for Discriminant and Principal Component Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the importance of time—a temporal representation of sound

Visual representations of speech signals
Natural gradient works efficiently in learning

Neural Computation
The Geometry of Algorithms with Orthogonality Constraints

SIAM Journal on Matrix Analysis and Applications
A Multilinear Singular Value Decomposition

SIAM Journal on Matrix Analysis and Applications
Positive tensor factorization

Pattern Recognition Letters
The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
A comparative study on content-based music genre classification

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Non-negative tensor factorization with applications to statistics and computer vision

ICML '05 Proceedings of the 22nd international conference on Machine learning
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Aggregate features and ADABOOST for music classification

Machine Learning
A study on three linear discriminant analysis based methods in small sample size problem

Pattern Recognition
Factoring Gaussian precision matrices for linear dynamic models

Pattern Recognition Letters
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiplicative updates for non-negative projections

Neurocomputing
Tensor Decompositions and Applications

SIAM Review
Modulation-scale analysis for content identification

IEEE Transactions on Signal Processing - Part II
Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features

IEEE Transactions on Audio, Speech, and Language Processing
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

IEEE Transactions on Audio, Speech, and Language Processing
MPCA: Multilinear Principal Component Analysis of Tensor Objects

IEEE Transactions on Neural Networks

Tensor distance based multilinear locality-preserved maximum information embedding

IEEE Transactions on Neural Networks
A survey of multilinear subspace learning for tensor data

Pattern Recognition
Tensor distance based multilinear globality preserving embedding: A unified tensor based dimensionality reduction framework for image and video classification

Expert Systems with Applications: An International Journal
Supervised dictionary learning for music genre classification

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Gait identification based on MPCA reduction of a video recordings data

ICCVG'12 Proceedings of the 2012 international conference on Computer Vision and Graphics
Movie genre classification using SVM with audio and video features

AMT'12 Proceedings of the 8th international conference on Active Media Technology
Elastic Net subspace clustering applied to pop/rock music structure analysis

Pattern Recognition Letters
Multifactor sparse feature extraction using Convolutive Nonnegative Tucker Decomposition

Neurocomputing
Classification accuracy is not enough

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Motivated by psychophysiological investigations on the human auditory system, a bio-inspired two-dimensional auditory representation of music signals is exploited, that captures the slow temporal modulations. Although each recording is represented by a second-order tensor (i.e., a matrix), a third-order tensor is needed to represent a music corpus. Non-negative multilinear principal component analysis (NMPCA) is proposed for the unsupervised dimensionality reduction of the third-order tensors. The NMPCA maximizes the total tensor scatter while preserving the non-negativity of auditory representations. An algorithm for NMPCA is derived by exploiting the structure of the Grassmann manifold. The NMPCA is compared against three multilinear subspace analysis techniques, namely the non-negative tensor factorization, the high-order singular value decomposition, and the multilinear principal component analysis as well as their linear counterparts, i.e., the non-negative matrix factorization, the singular value decomposition, and the principal components analysis in extracting features that are subsequently classified by either support vector machine or nearest neighbor classifiers. Three different sets of experiments conducted on the GTZAN and the ISMIR2004 Genre datasets demonstrate the superiority of NMPCA against the aforementioned subspace analysis techniques in extracting more discriminating features, especially when the training set has small cardinality. The best classification accuracies reported in the paper exceed those obtained by the state-of-the-art music genre classification algorithms applied to both datasets.