Towards optimal audio "keywords" detection for audio content analysis and discovery

Authors:
Lie Lu;Alan Hanjalic
Affiliations:
Microsoft Research Asia;Delft University of Technology
Venue:
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Year:
2006

Citing 13
Cited 2

Audio Feature Extraction and Analysis for Scene Segmentation and Classification

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Video Manga: generating semantically meaningful video summaries

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Determining computable scenes in films and their structures using audio-visual memory models

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Modern Information Retrieval

Modern Information Retrieval
Multiclass Spectral Clustering

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Semantic context detection based on hierarchical audio models

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
A time series clustering based framework for multimedia mining and summarization using audio features

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Unsupervised content discovery in composite audio

Proceedings of the 13th annual ACM international conference on Multimedia
Creating audio keywords for event detection in soccer video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Highlight sound effects detection in audio stream

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
A flexible framework for key audio effects detection and auditory context inference

IEEE Transactions on Audio, Speech, and Language Processing
Video summarization and scene detection by graph modeling

IEEE Transactions on Circuits and Systems for Video Technology

Text-like segmentation of general audio for content-based retrieval

IEEE Transactions on Multimedia
Fusing audio vocabulary with visual features for pornographic video detection

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Natural semantic sound clusters in an audio document, also referred to as audio elements, can be seen as an analogy to words in a text document. Based on the obtained set of audio elements, the key audio elements, or audio "keywords", can be detected, which are most prominent in characterizing the content of audio data. As such, they can be of great use for automatic audio content analysis and discovery. Motivated by the limitations of the existing methods for key audio element detection, we propose in this paper a novel unsupervised approach to audio elements weighting using multiple audio documents, analog to word weighting in text document analysis. In our approach, dominant feature vectors (DFV) are first extracted from each audio element, and used to measure the audio elements similarity, based on which the occurrence probability of one audio element in different audio documents can be estimated. Then, four factors, including expected term frequency, expected inverse document frequency, expected term duration, and expected inverse document duration, are calculated and combined to give the importance weight of each audio element. Evaluation of the obtained audio "keywords" and their usability for auditory scene segmentation and audio document clustering, performed on 5 hours of diverse audio data, shows highly promising results.