User-trainable video annotation using multimodal cues

  • Authors:
  • C-Y. Lin;M. Naphade;A. Natsev;C. Neti;J. R. Smith;B. Tseng;H. J. Nock;W. Adams

  • Affiliations:
  • IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY

  • Venue:
  • Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building classifiers in a score space defined by a pre-deployed set of multimodal models. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus. An interesting side result shows speech-only models give performance comparable to our best video-only models for detecting visual concepts such as "outdoors", "face" and "cityscape".