Sparse coding for drum sound classification and its use as a similarity measure

Authors:
Simon Scholler;Hendrik Purwins
Affiliations:
Bernstein Center for Computational Neuroscience, Berlin, Germany and Universitat Pompeu Fabra, Barcelona, Spain;Universitat Pompeu Fabra, Barcelona, Spain
Venue:
Proceedings of 3rd international workshop on Machine learning and music
Year:
2010

Citing 5
Cited 0

Random Forests

Machine Learning
Efficient Coding of Time-Relative Structure Using Spikes

Neural Computation
Sparse representations of polyphonic music

Signal Processing - Sparse approximations in signal and image processing
Environmental sound recognition with time-frequency audio features

IEEE Transactions on Audio, Speech, and Language Processing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although rare in the sound recognition literature, previous work using features derived from a sparse temporal representation has led to some success [8, 2, 9]. A great advantage of deriving features from a temporal representation is that such an approach does not face the trade-off problem between time and frequency resolution. Here, we present a biologically inspired two-step process for audio classification: In the first step, efficient basis functions are learned in an unsupervised manner [12] on mixtures of percussion sounds (drum phrases). In the second step, features are extracted by using the learned basis functions to decompose percussion sounds (base drum, snare drum, hi-hat) with matching pursuit [7]. The classification accuracy in a 3-class database transfer task is 91.5% as opposed to 70.7% when using MFCC features. Further, we show that a MP-feature representation preserves sound similarity to a greater extent than MFCC-features, i.e. an artificial mixture of two sounds of equal energy normally lies in the middle between the two single sound distributions in feature space. An MP-representation thus inherently contains a similarity measure between different sounds.