Fundamentals of speech recognition
Fundamentals of speech recognition
Statistical color models with application to skin detection
International Journal of Computer Vision
ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
A note on Platt's probabilistic outputs for support vector machines
Machine Learning
Solving the label resolution problem in supervised video content classification
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Foundations and Trends in Information Retrieval
Classification of indecent videos by low complexity repetitive motion detection
AIPR '08 Proceedings of the 2008 37th IEEE Applied Imagery Pattern Recognition Workshop
Detecting pornographic video content by combining image features with motion information
MM '09 Proceedings of the 17th ACM international conference on Multimedia
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision
A system that learns to tag videos by watching youtube
ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Fusing Audio-Words with Visual Features for Pornographic Video Detection
TRUSTCOM '11 Proceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications
Automatic detection of child pornography using color visual words
ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
AMVA'12: ACM international workshop on audio and multimedia methods for large-scale video analysis
Proceedings of the 20th ACM international conference on Multimedia
Hi-index | 0.00 |
We address the challenge of detecting pornographic content in video streams. On offensive material crawled from different pornographic websites and non-offensive clips from YouTube (a total of 500 hours of video), we first study a compressed-domain activity descriptor based on MPEG motion compensation vectors. We show that the approach offers an interesting alternative but generalizes poorly between videos compressed with different codecs, a problem that can be overcome to some extent by adding noise to the image data prior to video compression. Our main contribution is an evaluation that benchmarks the above motion-based descriptor as well as three other widely used features (audio-based MFCC features, skin color detection, and visual words). Here, we show that a multi-modal approach is a key strategy for an accurate detection or adult content: A combination of the different features gives considerable improvements in accuracy, reducing equal error by 36-56% compared to the best uni-modal system.