We are not equally negative: fine-grained labeling for multimedia event detection

Authors:
Zhigang Ma;Yi Yang;Zhongwen Xu;Nicu Sebe;Alexander G. Hauptmann
Affiliations:
University of Trento, Trento, Italy;Carnegie Mellon University, Pittsburgh, PA, USA;Zhejiang University, Hangzhou, China;University of Trento, Trento, Italy;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 21
Cited 0

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Live sports event detection based on broadcast video and web-casting text

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video event detection using motion relativity and visual relatedness

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video

MM '08 Proceedings of the 16th ACM international conference on Multimedia
A discriminative latent model of object classes and attributes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Lessons for the future from a decade of informedia video analysis research

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Sharing features between objects and their attributes

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
High level describable attributes for predicting aesthetics and interestingness

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News

IEEE Transactions on Multimedia
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

IEEE Transactions on Multimedia
l2,1-norm regularized discriminative feature selection for unsupervised learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Beyond audio and video retrieval: towards multimedia summarization

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Multimodal feature fusion for robust event detection in web videos

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
HMDB: A large video database for human motion recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Knowledge adaptation for ad hoc multimedia event detection with few exemplars

Proceedings of the 20th ACM international conference on Multimedia
Multi-channel shape-flow kernel descriptors for robust video event detection and retrieval

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Complex events detection using data-driven concepts

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Complex Event Detection via Multi-source Video Attributes

CVPR '13 Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia event detection (MED) is an effective technique for video indexing and retrieval. Current classifier training for MED treats the negative videos equally. However, many negative videos may resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not accurate enough to characterize the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the knowledge from the visual features and the attributes representations, which brings mutual reciprocality. Our model obtains two kinds of classifiers, one from the attributes and one from the features, which incorporate the informative cues from the fine-grained labels. The outputs of both classifiers on the testing videos are fused for detection. Extensive experiments on the challenging TRECVID MED 2012 development set have validated the efficacy of our proposed approach.