A hybrid framework for detecting the semantics of concepts and context

Authors:
Milind R. Naphade;John R. Smith
Affiliations:
IBM Thomas J. Watson Research Center, Pervasive Media Management Group, Hawthorne, NY;IBM Thomas J. Watson Research Center, Pervasive Media Management Group, Hawthorne, NY
Venue:
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Year:
2003

Citing 5
Cited 7

The nature of statistical learning theory

The nature of statistical learning theory
A framework for moderate vocabulary semantic visual concept detection

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
"What is in that video anyway?": In Search of Better Browsing

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Factor graphs and the sum-product algorithm

IEEE Transactions on Information Theory
Factor graph framework for semantic video indexing

IEEE Transactions on Circuits and Systems for Video Technology

On the detection of semantic concepts at TRECVID

Proceedings of the 12th annual ACM international conference on Multimedia
Interoperability Support between MPEG-7/21 and OWL in DS-MIRF

IEEE Transactions on Knowledge and Data Engineering
Inexpensive fusion methods for enhancing feature detection

Image Communication
Exploiting spatial context constraints for automatic image region annotation

Proceedings of the 15th international conference on Multimedia
Statistical modeling and conceptualization of natural images

Pattern Recognition
The state of the art in image and video retrieval

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Multimedia research challenges for industry

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic understanding of multimedia content necessitates models for the semantics of concepts, context and structure. We propose a hybrid framework that can combine discriminant or generative models for concepts with generative models for structure and context. Using the TREC Video 2002 benchmark corpus we show that robust models can be built for several diverse visual semantic concepts. We use a novel factor graphical framework to model inter-conceptual context for 12 semantic concepts of the corpus. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. Enforcement of this probabilistic context model enhances the detection performance further to 22% using the global multinet, whereas its factored approximation also leads to improvement by 18% over the baseline concept detection. This improvement is achieved without using any additional training data or separate annotations.