Violent scene detection using mid-level feature

Authors:
Vu Lam;Sang Phan;Thanh Duc Ngo;Duy-Dinh Le;Duc Anh Duong;Shin'ichi Satoh
Affiliations:
University of Science, Ho Chi Minh, Vietnam;The Graduate University for Advanced Studies (Sokendai), Tokyo, Japan;The Graduate University for Advanced Studies (Sokendai), Tokyo, Japan;National Institute of Informatics, Tokyo, Japan;University of Information Technology, Ho Chi Minh, Vietnam;National Institute of Informatics, Tokyo, Japan
Venue:
Proceedings of the Fourth Symposium on Information and Communication Technology
Year:
2013

Citing 15
Cited 0

Detecting Violent Scenes in Movies by Auditory and Visual Cues

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A Multimodal Approach to Violence Detection in Video Sharing Sites

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Violence detection in video using computer vision techniques

CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II
Violence Detection in Movies

CGIV '11 Proceedings of the 2011 Eighth International Conference Computer Graphics, Imaging and Visualization
Violence content classification using audio features

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Audio-Visual fusion for detecting violent scenes in videos

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Recognizing human actions by attributes

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Improving Image Classification Using Semantic Attributes

International Journal of Computer Vision
Action bank: A high-level representation of activity in video

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Human action recognition by learning bases of action attributes and parts

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A benchmarking campaign for the multimodal detection of violent scenes in movies

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Objects as attributes for scene classification

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Recommendations for video event recognition using concept vocabularies

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
A naive mid-level concept-based fusion approach to violence detection in Hollywood movies

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Violent scene detection (VSD) refers to the task of detecting shots containing violent scenes in videos. With a wide range of promising real-world applications (e.g. movies/films inspection, video on demand, semantic video indexing and retrieval), VSD has been an important research problem. A typical approach for VSD is to learn a violent scene classifier and then apply it to video shots. Finding good feature representation for video shots is therefore essential to achieving high classification accuracy. It has been shown in recent work that using low-level features results in disappointing performance, since low-level features cannot convey high-level semantic information to represent violence concept. In this paper, we propose to use mid-level features to narrow the semantic gap between low-level features and violence concept. The mid-level features of a training (or test) video shots are formulated by concatenating scores returned by attribute classifiers. Attributes related to violence concept are manually defined. Compared to the original violence concept, the attributes have smaller gap to the low-level feature. Each corresponding attribute classifier is trained by using low-level features. We conduct experiments on MediaEval VSD benchmark dataset. The results show that, by using mid-level features, our proposed method outperforms the standard approach directly using low-level features.