Querying for video events by semantic signatures from few examples

Authors:
Masoud Mazloom;Amirhossein Habibian;Cees G.M. Snoek
Affiliations:
University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 9
Cited 0

Learning the semantics of multimedia queries and concepts from a small number of examples

Proceedings of the 13th annual ACM international conference on Multimedia
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Double fusion for multimedia event detection

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Bridging the Gap: Query by Semantic Example

IEEE Transactions on Multimedia
Semantic Model Vectors for Complex Video Event Recognition

IEEE Transactions on Multimedia
Multimodal feature fusion for robust event detection in web videos

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Ensemble of exemplar-SVMs for object detection and beyond

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Knowledge adaptation for ad hoc multimedia event detection with few exemplars

Proceedings of the 20th ACM international conference on Multimedia
Recommendations for video event recognition using concept vocabularies

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We aim to query web video for complex events using only a handful of video query examples, where the standard approach learns a ranker from hundreds of examples. We consider a semantic signature representation, consisting of off-the-shelf concept detectors, to capture the variance in semantic appearance of events. Since it is unknown what similarity metric and query fusion to use in such an event retrieval setting, we perform three experiments on unconstrained web videos from the TRECVID event detection task. It reveals that: retrieval with semantic signatures using normalized correlation as similarity metric outperforms a low-level bag-of-words alternative, multiple queries are best combined using late fusion with an average operator, and event retrieval is preferred over event classification when less than eight positive video examples are available.