A multi-modal system for the retrieval of semantic video events

  • Authors:
  • Arnon Amir;Sankar Basu;Giridharan Iyengar;Ching-Yung Lin;Milind Naphade;John R. Smith;Savitha Srinivasan;Belle Tseng

  • Affiliations:
  • IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY

  • Venue:
  • Computer Vision and Image Understanding - Special issue on event detection in video
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A framework for event detection is proposed where events, objects, and other semantic concepts are detected from video using trained classifiers. These classifiers are used to automatically annotate video with semantic labels, which in turn are used to search for new, untrained types of events and semantic concepts. The novelty of the approach lies in the: (1) semi-automatic construction of models of events from feature descriptors and (2) integration of content-based and concept-based querying in the search process. Speech retrieval is independently applied and combined results are produced. Results of applying these to the Search benchmark of the NIST TREC Video track 2001 are reported, and the gained experience and future work are discussed.