Audio scene segmentation using multiple features, models and time scales

  • Authors:
  • H. Sundaram;S.-F. Chang

  • Affiliations:
  • Dept. of Electr. Eng., Columbia Univ., New York, NY, USA;-

  • Venue:
  • ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: a definition of an audio scene; multiple feature models that characterize the dominant sources; and a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.