Text-like segmentation of general audio for content-based retrieval

Authors:
Lie Lu;Alan Hanjalic
Affiliations:
Microsoft Research Asia, Beijing, China;Department of Mediamatics, Delft University of Technology, Delft, The Netherlands
Venue:
IEEE Transactions on Multimedia
Year:
2009

Citing 11
Cited 0

Video Scene Segmentation via Continuous Video Coherence

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
How to thematically segment texts by using lexical cohesion?

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Minimal-impact audio-based personal archives

Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
Unsupervised content discovery in composite audio

Proceedings of the 13th annual ACM international conference on Multimedia
Creating audio keywords for event detection in soccer video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Towards optimal audio "keywords" detection for audio content analysis and discovery

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Audio scene segmentation using multiple features, models and time scales

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
A flexible framework for key audio effects detection and auditory context inference

IEEE Transactions on Audio, Speech, and Language Processing
Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval

IEEE Transactions on Multimedia
Co-clustering for Auditory Scene Categorization

IEEE Transactions on Multimedia
Automated high-level movie segmentation for advanced video-retrieval systems

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic detection of (semantically) meaningful audio segments, or audio scenes, is an important step in high-level semantic inference from general audio signals, and can benefit various content-based applications involving both audio and multimodal (multimedia) data sets. Motivated by the known limitations of traditional low-level feature-based approaches, we propose in this paper a novel approach to discover audio scenes, based on an analysis of audio elements and key audio elements, which can be seen as equivalents to the words and keywords in a text document, respectively. In the proposed approach, an audio track is seen as a sequence of audio elements, and the presence of an audio scene boundary at a given time stamp is checked based on pair-wise measuring the semantic affinity between different parts of the analyzed audio stream surrounding that time stamp. Our proposed model for semantic affinity exploits the proven concepts from text document analysis, and is introduced here as a function of the distance between the audio parts considered, and the co-occurrence statistics and the importance weights of the audio elements contained therein. Experimental evaluation performed on a representative data set consisting of 5 h of diverse audio data streams indicated that the proposed approach is more effective than the traditional low-level feature-based approaches in solving the posed audio scene segmentation problem.