Pornography detection in video benefits (a lot) from a multi-modal approach
Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
Hi-index | 0.00 |
The traditional approach of filtering pornographic videos on the Internet is based on visual features of key frames. However, it cannot meet users' needs owing to the proliferation of low-resolution videos. To improve the filtering performance, we propose a novel framework of fusing audio-words with visual features for pornographic video detection. Our intention is not only to fuse the two modalities of visual images and audio signals, but also to narrow down the semantic gap between low-level features and high-level concepts by using the mid-level feature "audio-words". To further improve the performance, we present the segmentation algorithm based on units of energy envelope and the decision algorithm based on periodic patterns. The results show that our approach outperforms the traditional one which is based on visual features and achieves satisfactory performance. Moreover, the proposed segmentation algorithm is better than the conventional one using the same length and the proposed decision algorithm exceeds the conventional one using thresholds.