A two phase method for general audio segmentation

  • Authors:
  • Jessie Xin Zhang;Jacqueline Whalley;Stephen Brooks

  • Affiliations:
  • School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand;School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand;Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada

  • Venue:
  • ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a model-free and training-free two-phase method for audio segmentation that separates monophonic heterogeneous audio files into acoustically homogeneous regions where each region contains a single sound. A rough segmentation separates audio input into audio clips based on silence detection in the time domain. Then a self-similarity matrix, based on selected audio features in the frequency domain to discover the level of similarity between frames in the audio clip, is calculated. Subsequently an edge detection method is used to find regions in the similarity image that determine plausible sounds in the audio clip. The results of the two phases are combined to form the final boundaries for the input audio. This two-phase method is evaluated using established methods and a standard non-musical database. The method reported here offers more accurate segmentation results than existing methods for audio segmentation. We propose that this approach could be adapted as an efficient pre-processing stage in other audio processing systems such as audio retrieval, classification, music analysis and summarization.