Violent scene detection using mid-level feature

  • Authors:
  • Vu Lam;Sang Phan;Thanh Duc Ngo;Duy-Dinh Le;Duc Anh Duong;Shin'ichi Satoh

  • Affiliations:
  • University of Science, Ho Chi Minh, Vietnam;The Graduate University for Advanced Studies (Sokendai), Tokyo, Japan;The Graduate University for Advanced Studies (Sokendai), Tokyo, Japan;National Institute of Informatics, Tokyo, Japan;University of Information Technology, Ho Chi Minh, Vietnam;National Institute of Informatics, Tokyo, Japan

  • Venue:
  • Proceedings of the Fourth Symposium on Information and Communication Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Violent scene detection (VSD) refers to the task of detecting shots containing violent scenes in videos. With a wide range of promising real-world applications (e.g. movies/films inspection, video on demand, semantic video indexing and retrieval), VSD has been an important research problem. A typical approach for VSD is to learn a violent scene classifier and then apply it to video shots. Finding good feature representation for video shots is therefore essential to achieving high classification accuracy. It has been shown in recent work that using low-level features results in disappointing performance, since low-level features cannot convey high-level semantic information to represent violence concept. In this paper, we propose to use mid-level features to narrow the semantic gap between low-level features and violence concept. The mid-level features of a training (or test) video shots are formulated by concatenating scores returned by attribute classifiers. Attributes related to violence concept are manually defined. Compared to the original violence concept, the attributes have smaller gap to the low-level feature. Each corresponding attribute classifier is trained by using low-level features. We conduct experiments on MediaEval VSD benchmark dataset. The results show that, by using mid-level features, our proposed method outperforms the standard approach directly using low-level features.