Automatic detection of malicious sound using segmental two-dimensional mel-frequency cepstral coefficients and histograms of oriented gradients

Authors:
Myung Jong Kim;Younggwan Kim;JaeDeok Lim;Hoirin Kim
Affiliations:
Korea Advanced Institute of Science and Technology (KAIST) , Daejeon, South Korea;Korea Advanced Institute of Science and Technology (KAIST) , Daejeon, South Korea;Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea;Korea Advanced Institute of Science and Technology (KAIST) , Daejeon, South Korea
Venue:
Proceedings of the international conference on Multimedia
Year:
2010

Citing 6
Cited 0

Non-retrieval: Blocking Pornographic Images

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Similarity search in animal sound databases

IEEE Transactions on Multimedia
An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals

IEEE Transactions on Audio, Speech, and Language Processing
Audio-based context recognition

IEEE Transactions on Audio, Speech, and Language Processing
Automatic Classification of Bird Species From Their Sounds Using Two-Dimensional Cepstral Coefficients

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of recognizing malicious sounds, such as sexual scream or moan, to detect and block the objectionable multimedia contents. The malicious sounds show the distinct characteristics that have large temporal variations and fast spectral transitions. Therefore, extracting appropriate features to properly represent these characteristics is important in achieving a better performance. In this paper, we employ segment-based two-dimensional Mel-frequency cepstral coefficients and histograms of gradient directions as a feature set to characterize both the temporal variations and spectral transitions within a long-range segment of the target signal. Gaussian mixture model (GMM) is adopted to statistically represent the malicious and non-malicious sounds, and the test sounds are classified by a maximum a posterior probability (MAP) method. Evaluation of the proposed feature extraction method on a database of several hundred malicious and non-malicious sound clips yielded precision of 91.31% and recall of 94.27%. This result suggests that this approach could be used as an alternative to the image-based methods.