User-trainable video annotation using multimodal cues

Authors:
C-Y. Lin;M. Naphade;A. Natsev;C. Neti;J. R. Smith;B. Tseng;H. J. Nock;W. Adams
Affiliations:
IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY;IBM TJ Watson Research Center, NY
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 0
Cited 3

Discriminative model fusion for semantic concept detection and annotation in video

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Optimizing multi-graph learning: towards a unified video annotation scheme

Proceedings of the 15th international conference on Multimedia
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building classifiers in a score space defined by a pre-deployed set of multimodal models. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus. An interesting side result shows speech-only models give performance comparable to our best video-only models for detecting visual concepts such as "outdoors", "face" and "cityscape".