Unsupervised content discovery in composite audio

Authors:
Rui Cai;Lie Lu;Alan Hanjalic
Affiliations:
Tsinghua Univ., Beijing, China;Microsoft Research Asia, Beijing, China;Delft University of Technology, Delft, The Netherlands
Venue:
Proceedings of the 13th annual ACM international conference on Multimedia
Year:
2005

Citing 18
Cited 8

Determining computable scenes in films and their structures using audio-visual memory models

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Modern Information Retrieval

Modern Information Retrieval
A user attention model for video summarization

Proceedings of the tenth ACM international conference on Multimedia
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Video Scene Segmentation via Continuous Video Coherence

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Multiclass Spectral Clustering

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Information Theoretic Clustering of Sparse Co-Occurrence Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic context detection based on hierarchical audio models

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Minimal-impact audio-based personal archives

Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
A time series clustering based framework for multimedia mining and summarization using audio features

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Creating audio keywords for event detection in soccer video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Highlight sound effects detection in audio stream

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
A flexible framework for key audio effects detection and auditory context inference

IEEE Transactions on Audio, Speech, and Language Processing
Affective video content representation and modeling

IEEE Transactions on Multimedia
Automated high-level movie segmentation for advanced video-retrieval systems

IEEE Transactions on Circuits and Systems for Video Technology
Video summarization and scene detection by graph modeling

IEEE Transactions on Circuits and Systems for Video Technology

Towards optimal audio "keywords" detection for audio content analysis and discovery

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Bipartite isoperimetric graph partitioning for data co-clustering

Data Mining and Knowledge Discovery
A Novel Video Classification Method Based on Hybrid Generative/Discriminative Models

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Text-like segmentation of general audio for content-based retrieval

IEEE Transactions on Multimedia
Audio analysis for multimedia retrieval from a ubiquitous home

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Detecting individual role using features extracted from speaker diarization results

Multimedia Tools and Applications
Constrained co-clustering with non-negative matrix factorisation

International Journal of Business Intelligence and Data Mining
State of the art of smart homes

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatically extracting semantic content from audio streams can be helpful in many multimedia applications. Motivated by the known limitations of traditional supervised approaches to content extraction, which are hard to generalize and require suitable training data, we propose in this paper an unsupervised approach to discover and categorize semantic content in a composite audio stream. In our approach, we first employ spectral clustering to discover natural semantic sound clusters in the analyzed data stream (e.g. speech, music, noise, applause, speech mixed with music, etc.). These clusters are referred to as audio elements. Based on the obtained set of audio elements, the key audio elements, which are most prominent in characterizing the content of input audio data, are selected and used to detect potential boundaries of semantic audio segments denoted as auditory scenes. Finally, the auditory scenes are categorized in terms of the audio elements appearing therein. Categorization is inferred from the relations between audio elements and auditory scenes by using the information-theoretic co-clustering scheme. Evaluations of the proposed approach performed on 4 hours of diverse audio data indicate that promising results can be achieved, both regarding audio element discovery and auditory scene categorization.