Fusing audio vocabulary with visual features for pornographic video detection

Authors:
Yizhi Liu;Ying Yang;Hongtao Xie;Sheng Tang
Affiliations:
Key Laboratory of Knowledge Processing and Networked Manufacturing, College of Hunan Province, Xiangtan, China;College of Information and Electrical Engineering, China Agricultural University, Beijing, China;Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Venue:
Future Generation Computer Systems
Year:
2014

Citing 16
Cited 0

Audio Feature Extraction and Analysis for Scene Segmentation and Classification

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Finding Naked People

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Creating audio keywords for event detection in soccer video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Towards optimal audio "keywords" detection for audio content analysis and discovery

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Audio keywords generation for sports video analysis

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Classification of indecent videos by low complexity repetitive motion detection

AIPR '08 Proceedings of the 2008 37th IEEE Applied Imagery Pattern Recognition Workshop
Detecting pornographic video content by combining image features with motion information

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Semantic concept annotation based on audio PLSA model

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Pornprobe: an LDA-SVM based pornography detection system

MM '09 Proceedings of the 17th ACM international conference on Multimedia
A novel approach to musical genre classification using probabilistic latent semantic analysis model

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
A flexible framework for key audio effects detection and auditory context inference

IEEE Transactions on Audio, Speech, and Language Processing
A generic audio classification and segmentation approach for multimedia indexing and retrieval

IEEE Transactions on Audio, Speech, and Language Processing
Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval

IEEE Transactions on Multimedia
The retrieval of motion event by associations of temporal frequent pattern growth

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pornographic video detection based on multimodal fusion is an effective approach for filtering pornography. However, existing methods lack accurate representation of audio semantics and pay little attention to the characteristics of pornographic audios. In this paper, we propose a novel framework of fusing audio vocabulary with visual features for pornographic video detection. The novelty of our approach lies in three aspects: an audio semantics representation method based on an energy envelope unit (EEU) and bag-of-words (BoW), a periodicity-based audio segmentation algorithm, and a periodicity-based video decision algorithm. The first one, named the EEU+BoW representation method, is proposed to describe the audio semantics via an audio vocabulary. The audio vocabulary is constructed by k-means clustering of EEUs. The latter two aspects echo with each other to make full use of the periodicities in pornographic audios. Using the periodicity-based audio segmentation algorithm, audio streams are divided into EEU sequences. After these EEUs are classified, videos are judged to be pornographic or not by the periodicity-based video decision algorithm. Before fusion, two support vector machines are respectively applied for the audio-vocabulary-based and visual-features-based methods. To fuse their results, a keyframe is selected from each EEU in terms of the beginning and ending positions, and then an integrated weighted scheme and a periodicity-based video decision algorithm are adopted to yield final detection results. Experimental results show that our approach outperforms the traditional one which is only based on visual features, and achieves satisfactory performance. The true positive rate achieves 94.44% while the false positive rate is 9.76%.