A flexible framework for key audio effects detection and auditory context inference

Authors:
R. Cai;Lie Lu;A. Hanjalic;Hong-Jiang Zhang;Lian-Hong Cai
Affiliations:
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 19

Unsupervised content discovery in composite audio

Proceedings of the 13th annual ACM international conference on Multimedia
Towards optimal audio "keywords" detection for audio content analysis and discovery

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
A Novel Video Classification Method Based on Hybrid Generative/Discriminative Models

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Detecting Violent Scenes in Movies by Auditory and Visual Cues

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Characteristics-based effective applause detection for meeting speech

Signal Processing
Unstructured audio classification for environment recognition

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Semantic concept annotation based on audio PLSA model

MM '09 Proceedings of the 17th ACM international conference on Multimedia
GBED: group based event detection method for audio sensor networks

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Environmental sound recognition with time-frequency audio features

IEEE Transactions on Audio, Speech, and Language Processing
Text-like segmentation of general audio for content-based retrieval

IEEE Transactions on Multimedia
Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Audio contributions to semantic video search

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
The participation payoff: challenges and opportunities for multimedia access in networked communities

Proceedings of the international conference on Multimedia information retrieval
Utilizing affective analysis for efficient movie browsing

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Hierarchical keyframe-based video summarization using QR-decomposition and modified k-means clustering

EURASIP Journal on Advances in Signal Processing
Environmental sound classification for scene recognition using local discriminant bases and HMM

MM '11 Proceedings of the 19th ACM international conference on Multimedia
ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming

Autonomous Robots
A context aware sound classifier applied to prawn feed monitoring and energy disaggregation

Knowledge-Based Systems
Fusing audio vocabulary with visual features for pornographic video detection

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Key audio effects are those special effects that play critical roles in human's perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.