Multimodal genre classification of TV programs and YouTube videos

Authors:
Hazım Kemal Ekenel;Tomas Semela
Affiliations:
Institute of Anthropomatics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Institute of Anthropomatics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Venue:
Multimedia Tools and Applications
Year:
2013

Citing 15
Cited 0

Color indexing

International Journal of Computer Vision
Automatic recognition of film genres

Proceedings of the third ACM international conference on Multimedia
Image Indexing Using Color Correlograms

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Robust Real-Time Face Detection

International Journal of Computer Vision
Probability Estimates for Multi-class Classification by Pairwise Coupling

The Journal of Machine Learning Research
Automatic Sports Video Genre Classification using Pseudo-2D-HMM

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A note on Platt's probabilistic outputs for support vector machines

Machine Learning
Multi-modality web video categorization

Proceedings of the international workshop on Workshop on multimedia information retrieval
TV Genre Classification Using Multimodal Information and Multilayer Perceptrons

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Parallel neural networks for multimodal video genre classification

Multimedia Tools and Applications
Towards google challenge: combining contextual and social information for web video categorization

MM '09 Proceedings of the 17th ACM international conference on Multimedia
TubeFiler: an automatic web video categorizer

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Google challenge: incremental-learning for web video categorization on robust semantic feature space

MM '09 Proceedings of the 17th ACM international conference on Multimedia
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an automatic video genre classification system, which utilizes several low level audio-visual features as well as cognitive and structural information, and in case of web videos tag-based features, to classify the types of TV programs and YouTube videos. Classification is performed using an ensemble of support vector machines. The visual descriptors consist of color and texture-based features, which are often used to represent the concepts appearing in a video. The audio descriptors are signal energy, zero crossing rate, fundamental frequency, and mel-frequency cepstral coefficients representing a wide range of perceptual cues available in the audio signal. Cognitive descriptors correspond to the information derived from a face detector, whereas structural descriptors are related to shot editing of the video. Tag descriptor is used additionally for the genre classification of YouTube videos and it is based on term frequency-inverse document frequency measure. For each feature and type of genre a separate support vector machine classifier is trained following the one-vs-all scheme. The outputs of the classifiers are then combined to yield the final classification result. The proposed system is extensively evaluated using complete TV programs from Italian RAI TV channel, from a French TV channel, and videos from YouTube on which using only the audio-visual cues as well as cognitive and structural information, 99.2, 94.5 and 87.3% correct classification rates are attained, respectively. These results show that the developed system can reliably determine TV programs' genre. Incorporating tag feature to the content-based features increases the YouTube genre classification performance from 87.3 to 89.7%. Further experiments indicate that the quality of videos does not influence the results significantly. It is found that the performance drop in classifying genres of YouTube videos is mainly due to the large variety of content contained in these videos. In summary, this study shows that the proposed low level visual feature set, which we have used to represent the concepts appearing in a video, also provides robust cues for genre classification. In addition, obtained genre information is expected to provide additional cues which can be used to improve the concept detection system's performance. It has also been shown that ensemble of support vector machine classifiers outperforms neural network based classification proposed in the previous state-of-the-art genre classification systems (Montagnuolo and Messina, AIIA, LNAI 4733:730---741, 2007, Multimed Tools Appl 41(1):125---159, 2009). Besides the improvement in the employed feature set and classification scheme, the experimental framework of the study is exemplary with the extensive tests conducted on different domains ranging from TV programs from different countries to web videos.