Speech technology plays a key role in video semantic indexing

  • Authors:
  • Koichi Shinoda

  • Affiliations:
  • Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan

  • Venue:
  • Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Video semantic indexing is a core task in content-based video retrieval (CBVR), in which a user submits a text query for an object or a scene to a search system and the system returns video shots that include the object or scene. We introduce an emerging framework for this task, which heavily relies on statistical speaker verification and adaptation techniques. It employs Gaussian-mixture-model (GMM) supervectors and support vector machines (SVM) to detect a large variety of objects and scenes robustly from video. It has shown excellent performance in the Semantic indexing task of the TRECVID 2011 workshop, where a large archive of consumer-produced Internet videos are used for evaluation.