Multi-modality web video categorization

Authors:
Linjun Yang;Jiemin Liu;Xiaokang Yang;Xian-Sheng Hua
Affiliations:
Microsoft Research Asia, Beijing, China;Shanghai Jiaotong University, Shanghai, China;Shanghai Jiaotong University, Shanghai, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the international workshop on Workshop on multimedia information retrieval
Year:
2007

Citing 8
Cited 18

Automatic recognition of film genres

Proceedings of the third ACM international conference on Multimedia
Abstracting home video automatically

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 2)
Modern Information Retrieval

Modern Information Retrieval
Latent semantic analysis for an effective region-based video shot retrieval system

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
The challenge problem for automated detection of 101 semantic concepts in multimedia

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Multi-Classifier Systems: Review and a roadmap for developers

International Journal of Hybrid Intelligent Systems
Automatic mood detection and tracking of music audio signals

IEEE Transactions on Audio, Speech, and Language Processing
On the use of computable features for film classification

IEEE Transactions on Circuits and Systems for Video Technology

Large scale incremental web video categorization

WSMC '09 Proceedings of the 1st workshop on Web-scale multimedia corpus
MovieBase: a movie database for event detection and behavioral analysis

WSMC '09 Proceedings of the 1st workshop on Web-scale multimedia corpus
Towards google challenge: combining contextual and social information for web video categorization

MM '09 Proceedings of the 17th ACM international conference on Multimedia
TubeFiler: an automatic web video categorizer

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Commentary-based video categorization and concept discovery

Proceedings of the 2nd ACM workshop on Social web search and mining
Exploring large scale data for multimedia QA: an initial study

Proceedings of the ACM International Conference on Image and Video Retrieval
An effective method for video genre classification

Proceedings of the ACM International Conference on Image and Video Retrieval
Genre-specific semantic video indexing

Proceedings of the ACM International Conference on Image and Video Retrieval
Content-enriched classifier for web video classification

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Visual quality assessment for web videos

Journal of Visual Communication and Image Representation
Content-based video genre classification using multiple cues

Proceedings of the 3rd international workshop on Automated information extraction in media production
ShotTagger: tag location for internet videos

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Improved video categorization from text metadata and user comments

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Web video retagging

Multimedia Tools and Applications
Boosting web video categorization with contextual information from social web

World Wide Web
Cross-modal categorisation of user-generated video sequences

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Multimodal genre classification of TV programs and YouTube videos

Multimedia Tools and Applications
The jiku mobile video dataset

Proceedings of the 4th ACM Multimedia Systems Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports a first comprehensive study and large-scale test on web video (so-called user generated video or micro video) categorization. Observing that web videos are characterized by a much higher diversity of quality, subject, style, and genres compared with traditional video programs, we focus on studying the effectiveness of different modalities in dealing with such high variation. Specifically, we propose two novel modalities including a semantic modality and a surrounding text modality, as effective complements to most commonly used low-level features. The semantic modality includes three feature representations, i.e., concept histogram, visual word vector model and visual word Latent Semantic Analysis (LSA), while text modality includes the titles, descriptions and tags of web videos. We conduct a set of comprehensive experiments for evaluating the effectiveness of the proposed feature representations over different classifiers such as Support Vector Machine (SVM), Gaussian Mixture Model (GMM) and Manifold Ranking (MR). Our experiments on a large-scale dataset with 11k web videos (nearly 450 hours in total) demonstrate that (1) the proposed multimodal feature representation is effective for web video categorization, and (2) SVM outperforms GMM and MR on nearly all the modalities.