A two phase method for general audio segmentation

Authors:
Jessie Xin Zhang;Jacqueline Whalley;Stephen Brooks
Affiliations:
School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand;School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand;Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
Venue:
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Year:
2009

Citing 2
Cited 0

Visualizing music and audio using self-similarity

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
A speech/music discriminator based on RMS and zero-crossings

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a model-free and training-free two-phase method for audio segmentation that separates monophonic heterogeneous audio files into acoustically homogeneous regions where each region contains a single sound. A rough segmentation separates audio input into audio clips based on silence detection in the time domain. Then a self-similarity matrix, based on selected audio features in the frequency domain to discover the level of similarity between frames in the audio clip, is calculated. Subsequently an edge detection method is used to find regions in the similarity image that determine plausible sounds in the audio clip. The results of the two phases are combined to form the final boundaries for the input audio. This two-phase method is evaluated using established methods and a standard non-musical database. The method reported here offers more accurate segmentation results than existing methods for audio segmentation. We propose that this approach could be adapted as an efficient pre-processing stage in other audio processing systems such as audio retrieval, classification, music analysis and summarization.