Visualizing music and audio using self-similarity
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
A speech/music discriminator based on RMS and zero-crossings
IEEE Transactions on Multimedia
Hi-index | 0.00 |
This paper presents a model-free and training-free two-phase method for audio segmentation that separates monophonic heterogeneous audio files into acoustically homogeneous regions where each region contains a single sound. A rough segmentation separates audio input into audio clips based on silence detection in the time domain. Then a self-similarity matrix, based on selected audio features in the frequency domain to discover the level of similarity between frames in the audio clip, is calculated. Subsequently an edge detection method is used to find regions in the similarity image that determine plausible sounds in the audio clip. The results of the two phases are combined to form the final boundaries for the input audio. This two-phase method is evaluated using established methods and a standard non-musical database. The method reported here offers more accurate segmentation results than existing methods for audio segmentation. We propose that this approach could be adapted as an efficient pre-processing stage in other audio processing systems such as audio retrieval, classification, music analysis and summarization.