Audio scene segmentation using multiple features, models and time scales

Authors:
H. Sundaram;S.-F. Chang
Affiliations:
Dept. of Electr. Eng., Columbia Univ., New York, NY, USA;-
Venue:
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Year:
2000

Citing 0
Cited 3

Determining computable scenes in films and their structures using audio-visual memory models

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Text-like segmentation of general audio for content-based retrieval

IEEE Transactions on Multimedia
Effective TV advertising block division into single commercials method

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: a definition of an audio scene; multiple feature models that characterize the dominant sources; and a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.