Audio-Visual Event Recognition in Surveillance Video Sequences

Authors:
M. Cristani;M. Bicego;V. Murino
Affiliations:
Dipt. di Informatica, Univ. of Verona;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2007

Citing 0
Cited 10

Angle spectrum for estimation of trajectory deviation using combined tracking and neural network labeling

AREA '08 Proceedings of the 1st ACM workshop on Analysis and retrieval of events/actions and workflows in video streams
Short-term audio-visual atoms for generic video concept classification

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Audio-visual atoms for generic video concept classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Coupled multi-object tracking and labeling for vehicle trajectory estimation and matching

Multimedia Tools and Applications
Audio-visual grouplet: temporal audio-visual interactions for general video concept classification

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Event detection and classification in video surveillance sequences

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Survey on classifying human actions through visual sensors

Artificial Intelligence Review
Joint audio-visual bi-modal codewords for video event detection

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
A multimodal temporal panorama approach for moving vehicle detection, reconstruction and classification

Computer Vision and Image Understanding
Discovering joint audio---visual codewords for video event detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of the automated surveillance field, automatic scene analysis and understanding systems typically consider only visual information, whereas other modalities, such as audio, are typically disregarded. This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage and coupled with an audio BG/FG modelling scheme. These processes permit one to detect separate audio and visual patterns representing unusual unimodal events in a scene. The integration of audio and visual data is subsequently performed by exploiting the concept of synchrony between such events. The audio-visual (AV) association is carried out online and without need for training sequences, and is actually based on the computation of a characteristic feature called audio-video concurrence matrix, allowing one to detect and segment AV events, as well as to discriminate between them. Experimental tests involving classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by employing the single modalities and without considering the synchrony issue