CASSANDRA: audio-video sensor fusion for aggression detection

Authors:
W. Zajdel;J. D. Krijnders;T. Andringa;D. M. Gavrila
Affiliations:
Intelligent Systems Laboratory, Faculty of Science, University of Amsterdam, USA;Auditory Cognition Group, Artificial Intelligence, Rijksuniversiteit Groningen, Germany;Auditory Cognition Group, Artificial Intelligence, Rijksuniversiteit Groningen, Germany;Intelligent Systems Laboratory, Faculty of Science, University of Amsterdam, USA
Venue:
AVSS '07 Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance
Year:
2007

Citing 0
Cited 14

Decision-Level Fusion for Audio-Visual Laughter Detection

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Acoustic Based Surveillance System for Intrusion Detection

AVSS '09 Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance
Distributed activity recognition using key sensors

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 3
A bottom-up approach of fusion of events in surveillance systems

CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
Sound event recognition through expectancy-based evaluation ofsignal-driven hypotheses

Pattern Recognition Letters
Color based tracing in real-life surveillance data

Transactions on data hiding and multimedia security V
Risk analysis of a video-surveillance system

Proceedings of the 12th International Conference on Computer Systems and Technologies
Addressing multimodality in overt aggression detection

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Violence detection in video using computer vision techniques

CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II
Detecting F-formations as dominant sets

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Audio-Visual fusion for detecting violent scenes in videos

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
A naive mid-level concept-based fusion approach to violence detection in Hollywood movies

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
An evidential fusion approach for activity recognition in ambient intelligence environments

Robotics and Autonomous Systems
A comparative study on automatic audio-visual fusion for aggression detection using meta-information

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: “scream”, “passing train” or “articulation energy”. At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene. Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.