Multi-modality web video categorization

  • Authors:
  • Linjun Yang;Jiemin Liu;Xiaokang Yang;Xian-Sheng Hua

  • Affiliations:
  • Microsoft Research Asia, Beijing, China;Shanghai Jiaotong University, Shanghai, China;Shanghai Jiaotong University, Shanghai, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • Proceedings of the international workshop on Workshop on multimedia information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports a first comprehensive study and large-scale test on web video (so-called user generated video or micro video) categorization. Observing that web videos are characterized by a much higher diversity of quality, subject, style, and genres compared with traditional video programs, we focus on studying the effectiveness of different modalities in dealing with such high variation. Specifically, we propose two novel modalities including a semantic modality and a surrounding text modality, as effective complements to most commonly used low-level features. The semantic modality includes three feature representations, i.e., concept histogram, visual word vector model and visual word Latent Semantic Analysis (LSA), while text modality includes the titles, descriptions and tags of web videos. We conduct a set of comprehensive experiments for evaluating the effectiveness of the proposed feature representations over different classifiers such as Support Vector Machine (SVM), Gaussian Mixture Model (GMM) and Manifold Ranking (MR). Our experiments on a large-scale dataset with 11k web videos (nearly 450 hours in total) demonstrate that (1) the proposed multimodal feature representation is effective for web video categorization, and (2) SVM outperforms GMM and MR on nearly all the modalities.