Audio signal segmentation and classification using fuzzy c-means clustering

  • Authors:
  • Naoki Nitanda;Miki Haseyama;Hideo Kitajima

  • Affiliations:
  • Graduate School of Information Science and Technology, Hokkaido University, Sapporo, 060-0814 Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, 060-0814 Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, 060-0814 Japan

  • Venue:
  • Systems and Computers in Japan
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a method of segmentation and classification of audio signals which is coded by MPEG Audio. The proposed method first detects the boundaries between two different audio signals, which are called audio-cuts, and then classifies segments, which are called audio-segments, and uses audio-cuts detected by fuzzy c-means clustering their boundaries. Since conventional methods detect audio-cuts by applying thresholding to audio features such as energy and zero-crossing rate, misdetection often occurs when they are applied to an audio signal which contains several audio effects, such as fade-in, fade-out, cross-fade, and the like. The proposed method represents the possibility that the audio-cut exists by a real number from 0 to 1, obtained by using fuzzy c-means clustering; all of the possible candidates for the audio-cuts are detected. Since audio effects which are difficult to detect by using conventional methods are also detected as the candidates, misdetection can be reduced. Furthermore, the audio-segments, whose boundaries are the candidates, are subdivided into five audio classes, which are silence, speech, music, speech with music background, and speech with noise background. By using the classification results, unnecessary audio-cuts can be removed, and thereby both accurate audio-cut detection and accurate audio-segment classification can be attained. © 2006 Wiley Periodicals, Inc. Syst Comp Jpn, 37(4): 23–34, 2006; Published online in Wiley InterScience (). DOI 10.1002/scj.20491