Visualizing music and audio using self-similarity
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Content-Based Classification, Search, and Retrieval of Audio
IEEE MultiMedia
Generating Accurate Rule Sets Without Global Optimization
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Sound-source recognition: a theory and computational model
Sound-source recognition: a theory and computational model
Temporal Feature Integration for Music Genre Classification
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Sound sample indexing usually deals with the recognition of the source/cause that has produced the sound. For abstract sounds, sound effects, unnatural, or synthetic sounds, this cause is usually unknown or unrecognizable. An efficient description of these sounds has been proposed by Schaeffer under the name morphological description. Part of this description consists in describing a sound by identifying the temporal evolution of its acoustic properties to a set of profiles. In this paper, we consider three morphological descriptions: dynamic profiles (ascending, descending, ascending/descending, stable, impulsive), melodic profiles (up, down, stable, up/down, down/up) and complex-iterative sound description (non-iterative, iterative, grain, repetition). We study the automatic indexing of a sound into these profiles. Because this automatic indexing is difficult using standard audio features, we propose new audio features to perform this task. The dynamic profiles are estimated by modeling the loudness overtime of a sound by a second-order B-spline model and derive features from this model. The melodic profiles are estimated by tracking over time the perceptual filter which has the maximum excitation. A function is derived fromthis track which is then modeled using a second-order B-spline model. The features are again derived from the B-spline model. The description of complex-iterative sounds is obtained by estimating the amount of repetition and the period of the repetition. These are obtained by computing an audio similarity function derived from an Mel frequency cepstral coefficients (MFCC) similarity matrix. The proposed audio features are then tested for automatic classification. We consider three classification tasks corresponding to the three profiles. In each case, the results are compared with the ones obtained using standard audio features.