Dictionary learning based sparse coefficients for audio classification with max and average pooling

Authors:
Syed Zubair;Fei Yan;Wenwu Wang
Affiliations:
Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, GU2 7XH, UK;Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, GU2 7XH, UK;Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, GU2 7XH, UK
Venue:
Digital Signal Processing
Year:
2013

Citing 17
Cited 0

Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Dictionary learning algorithms for sparse representation

Neural Computation
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Method of optimal directions for frame design

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 05
Hierarchical classification of audio data for archiving and retrieving

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Dictionary learning for sparse approximations with the majorization method

IEEE Transactions on Signal Processing
-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

IEEE Transactions on Signal Processing
Matching pursuits with time-frequency dictionaries

IEEE Transactions on Signal Processing
Gradient Pursuits

IEEE Transactions on Signal Processing
Underdetermined blind source separation based on sparse representation

IEEE Transactions on Signal Processing
Greed is good: algorithmic results for sparse approximation

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audio classification is an important problem in signal processing and pattern recognition with potential applications in audio retrieval, documentation and scene analysis. Common to general signal classification systems, it involves both training and classification (or testing) stages. The performance of an audio classification system, such as its complexity and classification accuracy, depends highly on the choice of the signal features and the classifiers. Several features have been widely exploited in existing methods, such as the mel-frequency cepstrum coefficients (MFCCs), line spectral frequencies (LSF) and short time energy (STM). In this paper, instead of using these well-established features, we explore the potential of sparse features, derived from the dictionary of signal atoms using sparse coding based on e.g. orthogonal matching pursuit (OMP), where the atoms are adapted directly from audio training data using the K-SVD dictionary learning algorithm. To reduce the computational complexity, we propose to perform pooling and sampling operations on the sparse coefficients. Such operations also help to maintain a unified dimension of the signal features, regardless of the various lengths of the training and testing signals. Using the popular support vector machine (SVM) as the classifier, we examine the performance of the proposed classification system for two binary classification problems, namely speech-music classification and male-female speech discrimination and a multi-class problem, speaker identification. The experimental results show that the sparse (max-pooled and average-pooled) coefficients perform better than the classical MFCCs features, in particular, for noisy audio data.