Sound indexing using morphological description

Authors:
Geoffroy Peeters;Emmanuel Deruty
Affiliations:
Sound Analysis/Synthesis Team, IRCAM, CNRS, STMS, Paris, France;Sound Analysis/Synthesis Team, IRCAM, CNRS, STMS, Paris, France
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 6
Cited 0

Visualizing music and audio using self-similarity

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Sound-source recognition: a theory and computational model

Sound-source recognition: a theory and computational model
Temporal Feature Integration for Music Genre Classification

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sound sample indexing usually deals with the recognition of the source/cause that has produced the sound. For abstract sounds, sound effects, unnatural, or synthetic sounds, this cause is usually unknown or unrecognizable. An efficient description of these sounds has been proposed by Schaeffer under the name morphological description. Part of this description consists in describing a sound by identifying the temporal evolution of its acoustic properties to a set of profiles. In this paper, we consider three morphological descriptions: dynamic profiles (ascending, descending, ascending/descending, stable, impulsive), melodic profiles (up, down, stable, up/down, down/up) and complex-iterative sound description (non-iterative, iterative, grain, repetition). We study the automatic indexing of a sound into these profiles. Because this automatic indexing is difficult using standard audio features, we propose new audio features to perform this task. The dynamic profiles are estimated by modeling the loudness overtime of a sound by a second-order B-spline model and derive features from this model. The melodic profiles are estimated by tracking over time the perceptual filter which has the maximum excitation. A function is derived fromthis track which is then modeled using a second-order B-spline model. The features are again derived from the B-spline model. The description of complex-iterative sounds is obtained by estimating the amount of repetition and the period of the repetition. These are obtained by computing an audio similarity function derived from an Mel frequency cepstral coefficients (MFCC) similarity matrix. The proposed audio features are then tested for automatic classification. We consider three classification tasks corresponding to the three profiles. In each case, the results are compared with the ones obtained using standard audio features.