A multi-modal system for the retrieval of semantic video events

Authors:
Arnon Amir;Sankar Basu;Giridharan Iyengar;Ching-Yung Lin;Milind Naphade;John R. Smith;Savitha Srinivasan;Belle Tseng
Affiliations:
IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY;IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY
Venue:
Computer Vision and Image Understanding - Special issue on event detection in video
Year:
2004

Citing 15
Cited 4

An introduction to signal detection and estimation (2nd ed.)

An introduction to signal detection and estimation (2nd ed.)
Machine vision

Machine vision
The visual analysis of human movement: a survey

Computer Vision and Image Understanding
Human motion analysis: a review

Computer Vision and Image Understanding
Visual information retrieval

Visual information retrieval
Phonetic confusion matrix based spoken document retrieval

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Standard for multimedia databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Visual Event Detection

Visual Event Detection
Query by Image and Video Content: The QBIC System

Computer
A Survey on Content-Based Retrieval for Multimedia Databases

IEEE Transactions on Knowledge and Data Engineering
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
A Factor Graph Framework for Semantic Indexing and Retrieval in Video

CBAIVL '00 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL'00)
Coupled hidden Markov models for complex action recognition

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Automatic Parsing of TV Soccer Programs

ICMCS '95 Proceedings of the International Conference on Multimedia Computing and Systems
Toward speech as a knowledge resource

IBM Systems Journal

Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Text-based video content classification for online video-sharing sites

Journal of the American Society for Information Science and Technology
Multimedia Databases and Data Management: A Survey

International Journal of Multimedia Data Engineering & Management
Retrieval of high-dimensional visual data: current state, trends and challenges ahead

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A framework for event detection is proposed where events, objects, and other semantic concepts are detected from video using trained classifiers. These classifiers are used to automatically annotate video with semantic labels, which in turn are used to search for new, untrained types of events and semantic concepts. The novelty of the approach lies in the: (1) semi-automatic construction of models of events from feature descriptors and (2) integration of content-based and concept-based querying in the search process. Speech retrieval is independently applied and combined results are produced. Results of applying these to the Search benchmark of the NIST TREC Video track 2001 are reported, and the gained experience and future work are discussed.